LeakDojo: Decoding the Leakage Threats of RAG Systems

LeakDojo: Decoding the Leakage Threats of RAG Systems

ACL ARR 2026 January Submission148 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RAG leakage; data security.

Abstract: Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG systems grow more complex and LLMs exhibit stronger instruction-following capabilities, existing studies fall short of systematically assessing RAG leakage risks. We present LeakDojo, a configurable framework for controlled evaluation of RAG leakage. Using LeakDojo, we benchmark six existing attacks across fourteen LLMs, four datasets, and diverse RAG systems. Our study reveals that (1) query generation and adversarial instructions contribute independently to leakage, with overall leakage well approximated by their product; (2) stronger instruction-following capability correlates with higher leakage risk; and (3) improvements in RAG faithfulness can introduce increased leakage risk. These findings provide actionable insights for understanding and mitigating RAG leakage in practice. Our codebase is available at https://anonymous.4open.science/r/leakdojo055A.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: security and privacy, retrieval-augmented generation

Contribution Types: Model analysis & interpretability, Reproduction study, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 148

Loading