If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

ACL ARR 2025 May Submission2568 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) can carry out human-like dialogue, but unlike humans, they are stateless due to the superposition property. However, during multi-turn, multi-agent interactions, LLMs begin to exhibit consistent, character-like behaviors—hinting at a form of emergent lifelong learning. Despite this, existing benchmarks often fail to capture these dynamics, primarily focusing on static, open-ended evaluations. To address this gap, we introduce LifeState-BENCH, a benchmark designed to assess lifelong learning in LLMs. It features two episodic datasets—Hamlet and a synthetic script collection—rich in narrative structure and character interactions. Our fact-checking evaluation probes models’ self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches. Experiments on models like Llama3.1-8B, GPT-4-turbo, and DeepSeek R1, we demonstrate that non-parametric methods significantly outperform parametric ones in managing stateful learning. However, all models exhibit challenges with catastrophic forgetting as interactions extend, highlighting the need for further advancements in lifelong learning.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Lifelong Learning, Multi-agent Interaction, Benchmark

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English, Chinese

Submission Number: 2568

Loading