Plausible Token Amplification for Improving Accuracy of Differentially Private In-Context Learning Based on Implicit Bayesian Inference

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose Plausible Token Amplification (PTA), the first theoretically grounded method to improve the accuracy of DP-ICL by highlighting the task-relevant tokens.
Abstract: We propose Plausible Token Amplification (PTA) to improve the accuracy of Differentially Private In-Context Learning (DP-ICL) using DP synthetic demonstrations. While Tang et al. empirically improved the accuracy of DP-ICL by limiting vocabulary space during DP synthetic demonstration generation, its theoretical basis remains unexplored. By interpreting ICL as implicit Bayesian inference on a concept underlying demonstrations, we not only provide theoretical evidence supporting Tang et al.'s empirical method but also introduce PTA, a refined method for modifying next-token probability distribution. Through the modification, PTA highlights tokens that distinctly represent the ground-truth concept underlying the original demonstrations. As a result, generated DP synthetic demonstrations guide the Large Language Model to successfully infer the ground-truth concept, which improves the accuracy of DP-ICL. Experimental evaluations on both synthetic and real-world text-classification datasets validated the effectiveness of PTA.
Lay Summary: Large language models (LLMs) can solve tasks by showing just a few related examples in a prompt, called in-context learning (ICL). While useful, using raw examples risks leaking sensitive information. To address this, researchers rely on Differential Privacy (DP) --- a standard that limits information leakage --- by adding noise to create synthetic examples for ICL, thereby protecting the privacy of the raw examples. This method, known as DP-ICL, mitigates leakage of individual data but often degrades accuracy. Although a practical remedy exists, its effectiveness has been theoretically unclear. We offer a theoretical explanation showing that the existing remedy is reasonable, but also reveals room for improvement. On the basis of this insight, we propose Plausible Token Amplification (PTA), the first theoretically grounded approach that improves DP-ICL by generating clearer, more informative synthetic examples while preserving privacy. This enables more accurate and private use of LLMs in sensitive tasks like medical triage, legal review, or customer profiling.
Link To Code: https://github.com/Yusuke-Yamasaki/pta
Primary Area: Social Aspects->Privacy
Keywords: in-context learning, differential privacy, large language models
Submission Number: 1991
Loading