Keywords: Privacy in LLMs, Differential Privacy, Hallucination, Trustworthiness
TL;DR: We propose PEARL, an entropy-regulated framework for private language generation.
Abstract: Large language models (LLMs) commonly adopt Retrieval-Augmented Generation (RAG) to improve faithfulness. However, carefully crafted extraction prompts can elicit sensitive private information. Differential Privacy (DP) has therefore been integrated into LLM inference and is widely regarded as a standard safeguard; yet most work focuses on the utility–privacy trade-off, leaving the trustworthiness of DP outputs underexplored. To assess trustworthiness, we revisit the confidence gap (CG), which quantifies an LLM’s internal knowledge conflict. We show that CG correlates with both hallucination and exposure of personally identifiable information (PII). Building on this insight, we present PEARL, a CG‑guided, entropy‑aware private decoding framework. PEARL adaptively allocates the privacy budget across tokens and sentences based on CG, concentrating protection on spans likely to contain PII while stabilizing low‑confidence, hallucination‑prone regions. In experiments, PEARL improves response trustworthiness and robustness to PII extraction attacks.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 14975
Loading