RAGuard: A Layered Defense Framework for Retrieval-Augmented Generation Systems Against Data Poisoning

RAGuard: A Layered Defense Framework for Retrieval-Augmented Generation Systems Against Data Poisoning

ACL ARR 2026 January Submission8696 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Large Language Models, Data Poisoning Defense, Adversarial Robustness, Trustworthy AI, Model Safety, Secure NLP, Self-Healing AI, Black-box Defenses, Counterfactual Inference, Benchmarking, Evaluation Protocols, Foundation Models, Responsible AI, Information Retrieval Security

Abstract: Retrieval-Augmented Generation (RAG) systems are becoming more common in augmenting large language models (LLMs) with factual knowledge, yet they remain highly vulnerable to data poisoning, i.e., maliciously injected passages that manipulate retrieved evidence. We introduce RAGuard, a layered two-step defense framework that combines retrieval-level adversarial training with a novel zero-knowledge inference patch. The first step fine-tunes dense retrievers (e.g., Contriever, compatible with BGE and others) using synthetic poisoned documents (composed of poisons such as fabricated facts, contradictions, and reasoning traps), training them to downrank malicious passages. The second step applies a black-box approach zero knowledge inference patch that identifies and filters suspicious documents based on their causal influence on QA correctness, without requiring poison labels. Experiments on Natural Questions (NQ) and Benchmarking-IR (BEIR) show that RAGuard improves robustness by reducing the Attack Success Rate (ASR) while maintaining retrieval quality (Recall@5, MRR). Together, these layers offer an efficient and label-free defense against both known and unseen poisoning attacks, establishing a general framework for resilient, self-healing RAG pipelines.

Paper Type: Long

Research Area: Information Extraction and Retrieval

Research Area Keywords: Ethics, Bias, and Fairness, Information Retrieval and Text Mining, Question Answering, Resources and Evaluation, Semantics: Lexical and Sentence-Level

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 8696

Loading