Private Investigator: Extracting Personally Identifiable Information from Large Language Models Using Optimized Prompts

Published: 13 Aug 2025, Last Modified: 25 Jan 202634th USENIX Security SymposiumEveryoneCC BY 4.0
Abstract: Recent studies on training data extraction attacks have demonstrated significant threats to the language model ecosystem. In a typical machine learning deployment scenario where a pre-trained language model is fine-tuned on users’ private data, an adversary may attempt to leak personally identifiable information (PII) memorized by the fine-tuned model. Prior work has demonstrated this privacy risk by inducing a model to output PII in response to handcrafted or outsourced prompts. However, little attention has been given to how a smart adversary will design optimal prompts for successful PII extraction. In this work, we address this knowledge gap. We propose Private Investigator, an attack framework designed to optimize prompts for querying a target language model to extract PII used for its fine-tuning process. We propose a new prompt generation method that aims to craft promising prompts, which induce the target language model to emit as many PII items as possible by exploring diverse contexts. Private Investigator then exploits these generated prompts to conduct extraction attacks. To this end, we develop a prompt selection strategy that prioritizes the most promising prompts for successful PII extraction, taking full advantage of each extraction attack opportunity. In evaluation, we demonstrate that Private Investigator extracts up to 1,254 more email addresses, 634 more phone numbers, and 5,087 more personal names, outperforming existing attacks in extracting PII items.
Loading