Keywords: Personalization, Privacy, Interpretability
TL;DR: Personalized Privacy Control in LLMs via Attention Head Intervention
Abstract: The rise of agentic AI enables LLMs to access diverse user data, raising critical privacy concerns.
Prior work on contextual privacy studies whether LLMs regulate information disclosure according to context-dependent norms.
However, acceptable disclosure boundaries may vary across users even within the same context.
To address this limitation, we introduce personalized privacy, which incorporates user-specific disclosure preferences into privacy control.
We further present P3Bench (Personalized Privacy Preservation Benchmark), a benchmark extending contextual privacy policies with personalized disclosure constraints.
Experiments show that prompt-based policies fail to reliably enforce personalized privacy constraints, with Qwen2.5-7B and Gemma3-4B showing average policy ignorance ratios of 51.25% and 74.28%, respectively.
To address this problem, we propose REPAIR, a novel inference-time attention head intervention method that adjusts disclosure behavior toward policy-consistent responses.
Our method significantly reduces both over-refusal and over-sharing, improving adherence to user-specific privacy preferences.
Submission Number: 93
Loading