Mitigating Unintended Memory Use in LLMs via Structured Memory

Published: 04 Jun 2026, Last Modified: 04 Jun 2026ICML MemFM 2026 Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: personalisation, sycophancy, memory, cross-domain-leakage
Abstract: Conversational language models increasingly rely on persistent user memories for personalization, creating an inference-time surface for unintended recall of stored user information. While unintended memorization is often studied as training-data extraction, deployed memory systems introduce a parallel privacy risk: models may leak sensitive user details across unrelated contexts or defer sycophantically to remembered preferences. We investigate representation-level mitigations that reorganize the same memory set at inference time into fixed-domain partitions, dynamic-domain partitions, or a two-level memory tree, without changing the model or memory content. On PersistBench across seven frontier models, fixed partitioning reduces cross-domain leakage in six of seven models, while dynamic partitioning improves all seven and lowers leakage by $\sim8\%$ on average relative to the flat baseline while preserving desired personalization. These transformations also stack with some prompt-based defenses. Our work positions structured memory as a practical mitigation for unintended memorization in deployed foundation models, complementing prompt defenses.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 56
Loading