Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems

ICLR 2026 Conference Submission18281 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: security and privacy, security/privacy, red teaming

Abstract: Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, but this may expose them to extraction attacks, leading to potential copyright and privacy risks. However, existing extraction methods typically rely on malicious inputs such as prompt injection or jailbreaking, making them easily detectable via input- or output-level detection. In this paper, we introduce **I**mplicit **K**nowledge **E**xtraction **A**ttack (**IKEA**), which conducts *Knowledge Extraction* on RAG systems through benign queries. Specifically, **IKEA** first leverages anchor concepts—keywords related to internal knowledge—to generate queries with a natural appearance, and then designs two mechanisms that lead anchor concepts to thoroughly "explore" the RAG's knowledge: (1) Experience Reflection Sampling, which samples anchor concepts based on past query-response histories, ensuring their relevance to the topic; (2) Trust Region Directed Mutation, which iteratively mutates anchor concepts under similarity constraints to further exploit the embedding space. Extensive experiments demonstrate **IKEA**'s effectiveness under various defenses, surpassing baselines by over 80% in extraction efficiency and 90\% in attack success rate. Moreover, the substitute RAG system built from **IKEA**'s extractions shows comparable performance to the original RAG and outperforms those based on baselines across multiple evaluation tasks, underscoring the stealthy copyright infringement risk in RAG systems.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 18281

Loading