RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation
Keywords: Corpus poisoning
Abstract: Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm to enhance large language models (LLMs) with external knowledge, reducing hallucinations and compensating for outdated information. However, recent studies have exposed a critical vulnerability in RAG pipelines—corpus poisoning—where adversaries inject malicious documents into the retrieval corpus to manipulate model outputs. In this work, we propose two complementary retrieval-stage defenses:RAGPart and RAGMask Our defenses operate directly on the retriever, making them computationally lightweight and requiring no modification to the generation model. RAGPart leverages the inherent training dynamics of dense retrievers, exploiting document partitioning to mitigate the effect of poisoned points. In contrast, RAGMask identifies suspicious tokens based on significant similarity shifts under targeted token masking. Across two benchmarks, four poisoning strategies, and four state-of-the-art retrievers, our defenses consistently reduce attack success rates while preserving utility under benign conditions. In order to stress-test our defenses, we further introduce an interpretable attack, AdvRAGgen, which outperforms existing attacks across four different retrievers and two different datasets. Our findings highlight the potential and limitations of retrieval-stage defenses, providing practical insights for robust RAG deployments.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Information Retrieval and Text Mining
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 8495
Loading