RAGLeak: Membership Inference Attacks on RAG-Based Large Language Models

Published: 01 Jan 2025, Last Modified: 02 Aug 2025ACISP (3) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As newly emerged powerful tools for question-answering tasks, large language models (LLMs) have attracted significant attention in recent years. Despite their superior performance on various tasks, LLMs often lack domain knowledge or up-to-date knowledge. Retrieval-augmented generation (RAG) has been proposed as a solution to supplement LLMs and mitigate hallucination. However, while RAG systems offer considerable improvements, they also introduce new attack surfaces and privacy risks for LLMs. Current RAG systems are considered as robust against privacy attacks as all malicious queries can be easily detected and defended. However, we find that a hidden privacy attack on RAG may jeopardize the privacy of RAG. To show the risk of RAG system, we propose RAGLeak, a stealthy novel attack method to infer membership information in RAG-based LLMs. Our method crops a part of the query, using the former part as the input and the latter part as the true answer. We investigate the grey-box setting where the attacker can access the perplexity and the black-box setting where the attacker only has the input/output access. For the grey-box setting, we decide the membership states by thresholding the output perplexity; for black-box setting, we calculate the similarity between the output and the true answer and set the similarity threshold. We evaluate our method on two datasets and three LLMs. The results show that RAGLeak can bypass few-shot based defense method and achieve an accuracy of over 0.8 on all datasets and LLMs. This competitive performance on RAG-based LLMs demonstrates that our method poses significant privacy risks without complicated fine-tuning or retraining. This work presents the potential risk of RAG on LLM and highlights the need for robust privacy-preserving techniques in RAG-based LLMs.
Loading