Traceback of Poisoned Texts in Poisoning Attacks to Retrieval-Augmented Generation

Baolei Zhang; Haoran Xin; Minghong Fang; Zhuqing Liu; Biao Yi; Tong Li; Zheli Liu

Traceback of Poisoned Texts in Poisoning Attacks to Retrieval-Augmented Generation

Baolei Zhang, Haoran Xin, Minghong Fang, Zhuqing Liu, Biao Yi, Tong Li, Zheli Liu

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Systems and infrastructure for Web, mobile, and WoT

Keywords: retrieval-augmented generation, traceback, poisoning attack

Abstract: Large language models (LLMs) integrated with retrieval-augmented generation (RAG) systems enhance accuracy by accessing external knowledge database. However, recent studies have exposed RAG's vulnerability to poisoning attacks, where an attacker inject poisoned texts into the knowledge database, leading to attacker-desired responses. Existing defenses, primarily focused on inference-time mitigation, have proven inadequate against sophisticated attacks. In this paper, we present the first traceback system in RAG, RAGForensics, which traces poisoned texts from the knowledge database. RAGForensics narrows the space of potentially poisoned texts and accurately identifies them without requiring access to model gradients, a common challenge in RAG systems. Our empirical evaluation on multiple datasets demonstrates RAGForensics's effectiveness against state-of-the-art and adaptive poisoning attacks. This work pioneers the exploration of poisoned texts traceback in RAG systems, offering a practical and promising approach to securing them against poisoning attacks.

Submission Number: 2564

Loading