Personalized Privacy Control in LLMs via Attention Head Intervention

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Personalization, Privacy, Interpretability
TL;DR: Personalized Privacy Control in LLMs via Attention Head Intervention
Abstract: The rise of agentic AI enables LLMs to access diverse user data, raising critical privacy concerns. Prior work on contextual privacy studies whether LLMs regulate information disclosure according to context-dependent norms. However, acceptable disclosure boundaries may vary across users even within the same context. To address this limitation, we introduce personalized privacy, which incorporates user-specific disclosure preferences into privacy control. We further present P3Bench (Personalized Privacy Preservation Benchmark), a benchmark extending contextual privacy policies with personalized disclosure constraints. Experiments show that prompt-based policies fail to reliably enforce personalized privacy constraints, with Qwen2.5-7B and Gemma3-4B showing average policy ignorance ratios of 51.25% and 74.28%, respectively. To address this problem, we propose REPAIR, a novel inference-time attention head intervention method that adjusts disclosure behavior toward policy-consistent responses. Our method significantly reduces both over-refusal and over-sharing, improving adherence to user-specific privacy preferences.
Submission Number: 93
Loading