Track: Systems and infrastructure for Web, mobile, and WoT
Keywords: retrieval-augmented generation, traceback, poisoning attack
Abstract: Large language models (LLMs) integrated with retrieval-augmented generation (RAG) systems enhance accuracy by accessing external knowledge database. However, recent studies have exposed RAG's vulnerability to poisoning attacks, where an attacker inject poisoned texts into the knowledge database, leading to attacker-desired responses. Existing defenses, primarily focused on inference-time mitigation, have proven inadequate against sophisticated attacks. In this paper, we present the first traceback system in RAG, RAGForensics, which traces poisoned texts from the knowledge database. RAGForensics narrows the space of potentially poisoned texts and accurately identifies them without requiring access to model gradients, a common challenge in RAG systems. Our empirical evaluation on multiple datasets demonstrates RAGForensics's effectiveness against state-of-the-art and adaptive poisoning attacks. This work pioneers the exploration of poisoned texts traceback in RAG systems, offering a practical and promising approach to securing them against poisoning attacks.
Submission Number: 2564
Loading