LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

ICLR 2026 Conference Submission17961 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMail-Inject; Indirect Prompt Injection; LLM Security
TL;DR: We organized a public challenge on indirect prompt inject. This paper presents the results of the challenge and the collected dataset.
Abstract: Indirect Prompt Injection attacks exploit a fundamental weakness of large language models (LLMs): the inability to reliably separate instructions from data. This vulnerability poses critical real-world security risks, yet systematic evaluation against adaptive adversaries remains largely unexplored. We introduce LLMail-Inject, the first large-scale public challenge simulating a realistic email-assistant environment—a high-value attack surface in practice. Involving 839 participants, the challenge produced 208,095 unique attack prompts across multiple LLM architectures and retrieval configurations. Unlike prior benchmarks, LLMail-Inject requires end-to-end compromise: attacks must be retrieved, adaptively evade defenses, trigger unauthorized tool calls with correct formatting, and exfiltrate contextual data. Our findings reveal a stark gap between perceived and actual robustness: while state-of-the-art models achieve <5% success on existing benchmarks, LLMail-Inject drives success rates to 32%, exposing the fragility of current defenses under realistic conditions. We release the dataset, code, and analysis to catalyze research toward structural, practical defenses against prompt injection.
Primary Area: datasets and benchmarks
Submission Number: 17961
Loading