Reranker Helps, but Not Enough: Towards Strong Poisoning Attacks Against RAG

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, RAG, Data Poisoning Attacks
Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models with timely, external information, making their retrieval corpora a prime target for data poisoning. However, existing targeted poisoning attacks exhibit limited effectiveness against RAG equipped with a reranker to enhance retrieval quality. Remarkably, this defensive benefit comes at no additional cost: a reranker fine-tuned only on benign, in-domain documents can effectively filter malicious content without any adversarial training. To realistically evaluate RAG and strengthen red-teaming efforts, we conclude practical prompt design principles that reveal reranker blind spots. Building on these insights, we introduce the $\textbf{P}$rompt-$\textbf{P}$erturbation $\textbf{P}$oisoning $\textbf{A}$ttack ($\mathbf{P}^3 \mathbf{A}$), a novel framework for generating sophisticated poisoned documents. $\text{P}^3\text{A}$ first employs rule-based prompt engineering to craft initial poisoned texts designed to evade reranker filtering. It then injects subtle character-level perturbations into these texts, which promotes their ranking by the reranker while maintaining their adversarial effectiveness. These perturbations introduce only about 1\% textual change, ensuring the poisoned texts remain natural and readable. Extensive experiments demonstrate that our methods achieve effective attack performance, compromising reranker-enhanced RAG pipelines. Furthermore, our method exhibits strong transferability, proving equally effective against vanilla RAG—offering a more realistic and challenging benchmark for evaluating defense mechanisms. Code is available in the supplementary material.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6954
Loading