Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems

ACL ARR 2025 July Submission724 Authors

28 Jul 2025 (modified: 25 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, but this may expose them to extraction attacks, leading to potential copyright and privacy risks. However, existing extraction methods typically rely on malicious inputs such as prompt injection or jailbreaking, making them easily detectable via input- or output-level detection. In this paper, we introduce **I**mplicit **K**nowledge **E**xtraction **A**ttack (**IKEA**), which conducts *Knowledge Extraction* on RAG systems through benign queries. Specifically, **IKEA** first leverages anchor concepts to generate queries with the natural appearance, and then designs two mechanisms to guide the anchor concept to thoroughly "explore" the RAG's knowledge: (1) Experience Reflection Sampling, which samples anchor concepts based on past query-response histories, ensuring their relevance to the topic; (2) Trust Region Directed Mutation, which iteratively mutates anchor concepts under similarity constraints to further exploit the embedding space. Extensive experiments demonstrate **IKEA**'s effectiveness under various defenses, surpassing baselines by over 80\% in extraction efficiency and 90\% in attack success rate. Moreover, the substitute RAG system built from **IKEA**'s extractions consistently outperforms those based on baseline methods across multiple evaluation tasks, underscoring the stealthy copyright infringement risk in RAG systems.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: security and privacy, security/privacy, red teaming
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 724
Loading