Mitigating Over-Personalization in Language Models via Structured Memory

Hakeem Hannoon; Andrew Zhao; Mihir Narayan; Sharvin Goyal; Ivaxi Sheth

Mitigating Over-Personalization in Language Models via Structured Memory

Hakeem Hannoon, Andrew Zhao, Mihir Narayan, Sharvin Goyal, Ivaxi Sheth

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0

Keywords: personalisation, sycophancy, memory

Abstract: Conversational language models increasingly rely on persistent user memories for personalization, creating an inference-time surface for unintended recall of stored user information. While agentic systems raise broader safety and security concerns, personalized LLMs introduce a specific privacy and trustworthiness risk: models may leak sensitive user details across unrelated contexts or defer sycophantically to remembered preferences. We investigate representation-level mitigations that reorganize the same memory set into fixed-domain partitions, dynamic-domain partitions, or a two-level memory tree, without changing the model or memory content. On PersistBench across seven frontier models, fixed partitioning reduces cross-domain leakage in six of seven models, while dynamic partitioning improves all seven and lowers leakage by $\sim8\%$ on average relative to the flat baseline while preserving desired personalization. These transformations also stack with some prompt-based defenses. Our work positions structured memory as a practical safety mechanism for deployed personalized language models, complementary to prompt defenses.

Track: Short Paper (4 pages)

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 242

Loading