RAGuard: A Layered Defense Framework for Retrieval-Augmented Generation Systems Against Data Poisoning

Published: 08 Nov 2025, Last Modified: 24 Nov 2025ResponsibleFM @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval-Augmented Generation, Large Language Models, Data Poisoning Defense, Adversarial Robustness, Trustworthy AI, Model Safety, Secure NLP, Self-Healing AI, Black-box Defenses, Counterfactual Inference, Benchmarking, Evaluation Protocols, Foundation Models, Responsible AI, Information Retrieval Security
TL;DR: RAGuard is a two-layer defense for Retrieval-Augmented Generation that combines adversarial retriever training and a zero-knowledge filter to block poisoning, keeping LLMs robust and trustworthy under unseen attacks.
Abstract: Retrieval-Augmented Generation (RAG) systems are becoming more common in augmenting large language models (LLMs) with factual knowledge, yet they remain highly vulnerable to data poisoning, i.e., maliciously injected passages that manipulate retrieved evidence. We introduce RAGuard, a layered two-step defense framework that combines retrieval-level adversarial training with a novel zero-knowledge inference patch. The first step fine-tunes dense retrievers (e.g., Contriever, compatible with BGE and others) using synthetic poisoned documents (composed of poisons such as fabricated facts, contradictions, and reasoning traps), training them to downrank malicious passages. The second step applies a black-box approach zero knowledge inference patch that identifies and filters suspicious documents based on their causal influence on QA correctness, without requiring poison labels. Experiments on Natural Questions (NQ) and Benchmarking-IR (BEIR) show that RAGuard improves robustness by reducing the Attack Success Rate (ASR) while maintaining retrieval quality (Recall@5, MRR). Together, these layers offer an efficient and label-free defense against both known and unseen poisoning attacks, establishing a general framework for resilient, self-healing RAG pipelines.
Submission Number: 129
Loading