Evaluating the Long-Term Memory of Large Language Models

Evaluating the Long-Term Memory of Large Language Models

ACL ARR 2025 February Submission3328 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In applications such as dialogue systems, personalized recommendations, and personal assistants, large language models (LLMs) need to retain and utilize historical information over the long term to provide more accurate and consistent responses. Although long-term memory capability is crucial, recent studies have not thoroughly investigated the memory performance of large language models in long-term tasks. To address this gap, we introduce the Long-term Chronological Conversations (LOCCO) dataset and conduct a quantitative evaluation of the long-term memory capabilities of large language models. Experimental results demonstrate that large language models can retain past interaction information to a certain extent, but their memory decays over time. Increasing the number of trainable parameters can significantly enhance the model's memory capability for current data, but it also exacerbates long-term forgetting. While rehearsal strategies can enhance memory persistence, excessive rehearsal is not an effective memory strategy for large models, unlike in smaller models. Additionally, the models exhibit memory preferences across different categories of information. Our study not only provides a new framework and dataset for evaluating the long-term memory capabilities of large language models but also offers important references for future enhancements of their memory persistence.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Large language models, Long-term Memory, Conversation Datasets

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 3328

Loading