Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
TL;DR: We propose Cape, a context-aware and bucketized prompt perturbation mechanism based on differential privacy, to enable efficient LLM inference with an improved privacy-utility trade-off.
Abstract: Large Language Models (LLMs) have gained significant popularity due to their remarkable capabilities in text understanding and generation. However, despite their widespread deployment in inference services such as ChatGPT, concerns about the potential leakage of sensitive user data have arisen. Existing solutions primarily rely on privacy-enhancing technologies to mitigate such risks, facing the trade-off among efficiency, privacy, and utility. To narrow this gap, we propose Cape, a context-aware prompt perturbation mechanism based on differential privacy, to enable efficient inference with an improved privacy-utility trade-off. Concretely, we introduce a hybrid utility function that better captures the token similarity. Additionally, we propose a bucketized sampling mechanism to handle large sampling space, which might lead to long-tail phenomenons. Extensive experiments across multiple datasets, along with ablation studies, demonstrate that Cape achieves a better privacy-utility trade-off compared to prior state-of-the-art works.
Lay Summary: Despite the widespread deployment of large language models (LLMs) in inference services such as ChatGPT, concerns about the potential leakage of sensitive user data in prompts have arisen. We want to answer the question "Can we guarantee prompt privacy with a good privacy-utility trade-off?" To safeguard sensitive user information during inference, we leverage differential privacy (DP), which provides quantifiable privacy protection. We investigated how incorporating contextual information can improve the utility of DP-based perturbation. In addition, we examined the long-tail distribution problem prevalent in large-vocabulary settings. Interestingly, partitioning the vocabulary (i.e., the sampling space) into multiple buckets significantly alleviates this issue. The technique introduced can be integrated into LLM serving stacks, offering immediate improvements to prompt-level privacy with minimal changes to existing APIs or service infrastructure. As a result, it brings private LLM inference closer to practical, scalable deployment in real-world applications.
Primary Area: Social Aspects->Privacy
Keywords: differential privacy, private selection, large language model, black-box inference
Submission Number: 1271
Loading