Deployment-Time Memorization in Foundation-Model Agents

Published: 04 Jun 2026, Last Modified: 04 Jun 2026ICML MemFM 2026 Workshop OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Memory, Agent
Abstract: Foundation-model agents are becoming long-lived systems that remember users across interactions, making memorization an explicit deployment-time function rather than only a property of model weights. Existing work studies parametric memorization or audits fixed agent-memory configurations, but does not characterize how memory design knobs shape personalization, extraction risk, and deletion fidelity. We study this surface as deployment-time memorization. We formulate agent memory as a privacy--utility frontier, measuring utility with Personalization Recall and leakage with Adversarial Extraction Rate, and sweep two practical knobs: summarization aggressiveness and retrieval breadth. We further introduce Forgetting Residue Score to measure whether deleted information remains recoverable from derived memory tiers. On LongMemEval, key-fact summarization reduces canary extraction by 76% on Gemma~3~12B and 64% on GPT-4o-mini while preserving nearly all personalization recall; once compressed away, increasing top-$k$ retrieval no longer restores leakage. However, the same compression creates a deletion-fidelity failure: raw-only deletion leaves derived summary copies recoverable in roughly 20% of instances, while only full-pipeline purge or tombstone redaction drives worst-tier residue to zero. These results show that persistent agent memory should be evaluated as a first-class memorization mechanism: by what it helps agents recall, what it makes extractable, and what it can truly erase.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 51
Loading