Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

ACL ARR 2026 January Submission3980 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prompt Extraction, Large Language Models

Abstract: The drastic increase of large language model (LLM) parameters has led to a new research direction of fine-tuning-free downstream customization by designing prompts. While these prompt-based agents play an important role in many businesses, there has emerged growing concerns about the prompt leakage, which undermines the intellectual properties of these services and causes downstream attacks. In this paper, we analyze the underlying mechanisms of prompt leakage. By exploring the scaling laws in prompt extraction, we analyze key attributes that influence prompt extraction, including model sizes, prompt lengths, as well as prompt types. Besides, we propose two hypotheses to explain how LLMs expose their prompts. The first is attributed to the perplexity, i.e., the familiarity of LLMs with texts, whereas the second is based on the straightforward token translation paths in attention matrices. To defend against such threats, we investigate whether alignments can mitigate the extraction of prompts. We find that current LLMs, even those with safety alignments, are highly vulnerable to prompt extraction attacks, even under the most straightforward user attacks. Therefore, we propose several defense strategies with the inspiration of our findings, which achieve almost 71.0% drop in the prompt extraction rate. Our source code is available at https://anonymous.4open.science/r/PromptExtractionEval-C6B7/.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: adversarial attacks, knowledge tracing

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 3980

Loading