DEAL: High-Efficacy Privacy Attack on Retrieval-Augmented Generation Systems via LLM Optimizer

Tailai Zhang; Yuxuan Jiang; Ruihan Gong; Pan Zhou; Wen Yin; Xingxing Wei; Lixing Chen; Daizong Liu

DEAL: High-Efficacy Privacy Attack on Retrieval-Augmented Generation Systems via LLM Optimizer

Tailai Zhang, Yuxuan Jiang, Ruihan Gong, Pan Zhou, Wen Yin, Xingxing Wei, Lixing Chen, Daizong Liu

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Data Privacy

TL;DR: We use LLM to optimize the attack string to efficiently steal data from the privacy database of the RAG model.

Abstract: Retrieval-Augmented Generation (RAG) technology provides a powerful means of combining private databases with large language models (LLMs). In a typical RAG system, a set of documents is retrieved from a private database and inserted into the final prompt, which is then fed into the LLM. Existing research has shown that an attacker can use a simple manually designed attack suffix to induce LLM to output private documents in prompt with high probability. However, in this paper, we demonstrate that the privacy leakage risk exhibited by using this simple manual attack suffix is significantly underestimated. We propose a novel attack method called Documents Extraction Attack via LLM-Optimizer (DEAL). DEAL leverages an LLM as optimizer to iteratively refine attack strings, inducing the RAG model to reveal private data in its responses. Notably, our attack method does not require any knowledge about the target LLM, including its gradient information or model type. Instead, the attack can be executed solely through query access to the RAG model. We evaluate the effectiveness of our attack on multiple LLM architectures, including Qwen2, Llama3.1, and GPT-4o, across different attack tasks such as Entire Documents Extraction and Private Identity Information (PII) Extraction. Under the same permission setting as the existing method, the Mean Rouge-L Recall (MRR) of our method can reach more than 0.95 on average in the Entire Documents Extraction task, and we can steal PII from the retrieved documents with close to 99\% accuracy in the PII Extraction task, highlighting the risk of privacy leakage in RAG systems.

Supplementary Material: pdf

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8601

Loading