Mitigating Over-Personalization in Language Models via Structured Memory
Keywords: personalisation, sycophancy, memory
Abstract: Conversational language models increasingly rely on persistent user memories for personalization, creating an inference-time surface for unintended recall of stored user information. While agentic systems raise broader safety and security concerns, personalized LLMs introduce a specific privacy and trustworthiness risk: models may leak sensitive user details across unrelated contexts or defer sycophantically to remembered preferences. We investigate representation-level mitigations that reorganize the same memory set into fixed-domain partitions, dynamic-domain partitions, or a two-level memory tree, without changing the model or memory content. On PersistBench across seven frontier models, fixed partitioning reduces cross-domain leakage in six of seven models, while dynamic partitioning improves all seven and lowers leakage by $\sim8\%$ on average relative to the flat baseline while preserving desired personalization. These transformations also stack with some prompt-based defenses. Our work positions structured memory as a practical safety mechanism for deployed personalized language models, complementary to prompt defenses.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 242
Loading