In 2023, a new threat to LLM control was discovered when a Stanford University student decided on a study. This study became one of the most cybersecurity-relevant security tests in AI history. The outcome showed that LLMs can be manipulated to override their preset instructions without tools, using only well-crafted natural language.
As security experts and LLM programmers develop safeguards and solutions, attackers evolve new ways to bypass them. Understanding prompt injections in LLMs is the first step for developers navigating the risks they pose.
All LLMs have a set of preset rules that govern what information they provide and how they generally approach their tasks. Prompt injection attacks put a spin on this. They are a form of cyberattack that seeks to manipulate LLMs by inserting malicious prompts that trick the model into ignoring its original system prompts.
Unlike most cybersecurity threats, which require code or specialized tools, prompt injection works through natural language. It exploits a fundamental limitation of LLMs: their inability to distinguish between developer instructions and user-provided data. This problem escalates as LLMs gain traction in various vital fields.securit
Furthermore, malicious prompts can be hidden on webpages, in images, in documents, or even in emails. This makes them very difficult to defend against.
Prompt injection attacks take advantage of the way natural language input is processed by LLMs. LLMs function by predicting the next tokens in a sequence, considering their training and the provided context. When an application transmits a system prompt to the model, along with user input, it treats the entire sequence as a single context.
Attackers can exploit this gap by crafting inputs to alter how the LLM responds to subsequent prompts. When this occurs, the LLM finds it difficult to distinguish between legitimate instructions and malicious input. To offer useful responses, it may inadvertently adhere to instructions provided by attackers if those instructions are seamlessly integrated into user queries.
This is one of the more serious security risks of AI-generated code — the model itself becomes a vector, producing outputs that look legitimate but carry embedded vulnerabilities or backdoors that traditional code review may never catch.
Prompt injection is often categorized into two types:
Direct prompt injection often operates through the interface of an LLM, where an attacker-crafted input is designed to get the LLM to circumvent its safety guardrails and content policies. It can be done as many times as possible until the attacker achieves their objective, hence the term "jailbreaking."
Indirect prompt injection takes a slightly different approach. Rather than direct input in a chat-like interface, the malicious instructions are embedded into a different content type (audio, documents, emails, etc.) These content types are then processed by the model and influence the model’s behavior.
There are a couple of real-life examples of prompt injection attacks. Two key examples are:
The Bing Chat Sydney Incident is one of the most notable cases of prompt injection, in which Stanford student Kevin Liu performed a prompt-injection attack that revealed the internal workings of Microsoft’s new AI-powered search advisor.
Two things developers can learn to better protect their systems are, first, that architecture is key. LLM architecture often merges user inputs and model instructions in the same context, allowing user input to manipulate models. Design your architecture so the system message, tool instructions, and user message are transmitted and enforced over separate, protected channels, with middleware that prevents unvetted user content from being merged or executed as instructions.
Secondly, enforce strict conversation tokens and turn limits. Direct prompt injection benefits from long conversations where various jailbreaking prompts can be tested. Use guardrails like conversation limits, instruction reinforcement, and context trimming to reduce the risks associated with prompt injection.
Recognizing the dangers that prompt injection poses is a first step to preventing it. Here are some other things developers can do to prevent prompt-injection attacks in 2026.
Developers should implement the following security controls to protect their LLM applications from various types of attacks:
To learn more about the security risks of AI-generated code and how to manage them, read our blog!
The best way for developers to test for prompt injection vulnerabilities is through active simulations of adversarial behaviors.
Developers can start by conducting prompt-injection red-team testing to test common injection patterns. They can perform adversarial prompt testing by varying prompts and role-playing scenarios. They can also evaluate tool and API abuse scenarios. They can also test indirect prompt injection by injecting malicious instructions into various data types. These and more steps are ones they can take and are encouraged in Security Journey’s hands-on, role-based trainings.
The best way to build real-world skills to defend against prompt injection is through practical training and exposure to these vulnerabilities.
The rapid advantages that LLMs present could be lost if security is not prioritized. Prioritizing security means proper training and continuous education for developers. The best way to stay ahead of adversaries who try to inject malicious prompts, either directly or indirectly, is to provide hands-on training that lets developers interact with these prompts and develop practical solutions.
Security Journey provides hands-on training, practical insights, and an ever-evolving system that addresses a range of security threats.
As systems become increasingly dependent on LLMs and AI chatbots, it has become imperative for developers to strengthen their security skills to protect their systems and prevent attacks. Schedule a demo today!