Keywords: LLM, RAG, Data Poisoning Attacks
Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models with timely, external information, making their retrieval corpora a prime target for data poisoning.
However, existing targeted poisoning attacks exhibit limited effectiveness against RAG equipped with a reranker to enhance retrieval quality.
Remarkably, this defensive benefit comes at no additional cost: a reranker fine-tuned only on benign, in-domain documents can effectively filter malicious content without any adversarial training.
To realistically evaluate RAG and strengthen red-teaming efforts, we conclude practical prompt design principles that reveal reranker blind spots.
Building on these insights, we introduce the $\textbf{P}$rompt-$\textbf{P}$erturbation $\textbf{P}$oisoning $\textbf{A}$ttack ($\mathbf{P}^3 \mathbf{A}$), a novel framework for generating sophisticated poisoned documents.
$\text{P}^3\text{A}$ first employs rule-based prompt engineering to craft initial poisoned texts designed to evade reranker filtering.
It then injects subtle character-level perturbations into these texts, which promotes their ranking by the reranker while maintaining their adversarial effectiveness.
These perturbations introduce only about 1\% textual change, ensuring the poisoned texts remain natural and readable.
Extensive experiments demonstrate that our methods achieve effective attack performance, compromising reranker-enhanced RAG pipelines.
Furthermore, our method exhibits strong transferability, proving equally effective against vanilla RAG—offering a more realistic and challenging benchmark for evaluating defense mechanisms.
Code is available in the supplementary material.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6954
Loading