RAGuard: A Layered Defense Framework for Retrieval-Augmented Generation Systems Against Data Poisoning
Keywords: Retrieval-Augmented Generation, Large Language Models, Data Poisoning Defense, Adversarial Robustness, Trustworthy AI, Model Safety, Secure NLP, Self-Healing AI, Black-box Defenses, Counterfactual Inference, Benchmarking, Evaluation Protocols, Foundation Models, Responsible AI, Information Retrieval Security
Abstract: Retrieval-Augmented Generation (RAG) systems are becoming more common in augmenting large language models (LLMs) with factual knowledge, yet they remain highly vulnerable to data poisoning, i.e., maliciously injected passages that manipulate retrieved evidence. We introduce RAGuard, a layered two-step defense framework that combines retrieval-level adversarial training with a novel zero-knowledge inference patch. The first step fine-tunes dense retrievers (e.g., Contriever, compatible with BGE and others) using synthetic poisoned documents (composed of poisons such as fabricated facts, contradictions, and reasoning traps), training them to downrank malicious passages. The second step applies a black-box approach zero knowledge inference patch that identifies and filters suspicious documents based on their causal influence on QA correctness, without requiring poison labels. Experiments on Natural Questions (NQ) and Benchmarking-IR (BEIR) show that RAGuard improves robustness by reducing the Attack Success Rate (ASR) while maintaining retrieval quality (Recall@5, MRR). Together, these layers offer an efficient and label-free defense against both known and unseen poisoning attacks, establishing a general framework for resilient, self-healing RAG pipelines.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Ethics, Bias, and Fairness, Information Retrieval and Text Mining, Question Answering, Resources and Evaluation, Semantics: Lexical and Sentence-Level
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 8696
Loading