CloneMem: Benchmarking Long-Term Memory for AI Clones
Keywords: agent memory, agent evaluation
TL;DR: We introduce CloneMem, a benchmark for evaluating AI clones' long-term memory using digital traces (diaries, social media) to assess tracking of experiences, emotions, and evolving opinions over time.
Abstract: AI Clones aim to simulate an individual’s thoughts and behaviors to enable long-term, personalized interaction, placing stringent demands on memory systems to model experiences, emotions, and opinions over time. Existing memory benchmarks primarily rely on user–agent conversational histories, which are temporally fragmented and insufficient for capturing continuous life trajectories. We introduce CloneMem, a benchmark for evaluating long-term memory in AI Clone scenarios grounded in non-conversational digital traces, including diaries, social media posts, and emails, spanning one to three years. CloneMem} adopts a top-down data construction framework to ensure longitudinal coherence and defines tasks that assess an agent’s ability to track evolving personal states. Experiments show that current memory mechanisms struggle in this setting, highlighting open challenges for life-grounded personalized AI. Code and dataset are available at https://anonymous.4open.science/r/CloneMem-C6E1
Submission Number: 26
Loading