Indirect Prompt Injection
Indirect prompt injection is the variant of prompt injection where the attacker never communicates with the model directly; instructions are planted in content the model later retrieves, such as web pages, emails, documents, or RAG sources.
Indirect prompt injection is the form of prompt injection in which the attacker plants instructions in content that an LLM application will later ingest, rather than typing the instructions into the chat box themselves. It was named and systematically characterized by Kai Greshake and colleagues in the 2023 paper "Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection." Typical delivery channels include public web pages that the model browses, emails that an assistant summarizes, documents loaded into a RAG index, calendar invites, support tickets, code in repositories an assistant indexes, and images processed by a multimodal model that contain instructions in visible text, off-canvas pixels, or metadata. Because retrieval is the trigger, defenders cannot easily filter the attacker at ingress — the malicious content arrives by way of normal application behavior. Demonstrated impacts in research and production include data exfiltration to attacker-controlled URLs, unauthorized tool invocation, manipulation of summaries shown to the user, and lateral movement when one compromised context produces output that seeds another. Defenses overlap with general prompt-injection mitigations — spotlighting, dual-LLM patterns, capability constraints, and human-in-the-loop confirmation for sensitive actions — but indirect attacks are harder to spot because the user often never sees the injected text. See Prompt Injection in LLM Systems and OWASP LLM Top 10 for context.