Sweeping Promptable Spoofs under the DirtyRAG: A Practical, Query-Blind RAG Attack Done Right

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0
Keywords: ML security, RAG attack, poisoning
TL;DR: The RAG attack surface is powerful, yet existing attacks are often query-dependent and impractical. We propose a query-blind RAG attack effective under realistic threat models, while advancing metrics and evaluation practices in this field.
Abstract: Retrieval-Augmented Generation (RAG) enables LLMs to efficiently review query-relevant material and deliver better answers. However, the same pipeline also introduces an additional attack surface: adversarial passages (a.k.a. "spoofs") can be injected into the knowledge bank, and thereby mislead LLM outputs upon retrieval. Despite the widespread demand for RAG, the handful of existing attacks often share three critical shortcomings: (1) They are query-dependent, demanding an unrealistic level of privilege escalation and cost — such as real-time conversation surveillance, spoof crafting, and injection — making them entirely neutralizable by simple system measures like data freezes or timeouts. (2) Their constructed spoofs diverge so sharply from benign text, to the point that a trivial perplexity filter can reach >= 0.9 AUC. (3) These spoofs lack steerability, meaning that even successful retrievals may fail to influence the LLM to reflect the attacker’s intent. To bridge the gap, we present DirtyRAG: a query-blind, black-box, generation-based RAG attack that bypasses all three issues. DirtyRAG can be flexibly prompted to deliver any intended payload while remaining robust against standard defenses. Additionally, we identify several lapses in existing RAG attack evaluations and introduce RAGAttack Bench, a rigorous benchmark designed to reflect real-world attack scenarios, providing a polished testbed for future research in this critical domain.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 198
Loading