LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

09 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMail-Inject; Indirect Prompt Injection; LLM Security
TL;DR: We organized a public challenge on indirect prompt inject. This paper presents the results of the challenge and the collected dataset.
Abstract: Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) in discriminating between instructions and data in their prompts. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks can have wide security and privacy implications, and many real-world LLM applications remain vulnerable. We present the results of LLMail-Inject, a public challenge simulating a realistic scenario where participants adaptively inject malicious instructions into emails to trigger unauthorized tool calls in an LLM-based email assistant. The challenge spanned multiple defense strategies, LLM architectures, and retrieval configurations, resulting in a dataset of 208,095 unique attack submissions. We release the challenge code, the full dataset of submissions, and our analysis demonstrating how this data can provide new insights into the instruction-data separation problem. We hope that this will serve as a foundation for future research on practical and structural solutions to prompt injection.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/microsoft/llmail-inject-challenge
Code URL: https://github.com/microsoft/llmail-inject-challenge-analysis
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Submission Number: 984
Loading