Contextual Experience Replay for Continual Learning of Language Agents

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, agent, language agents, reasoning, decision making, NLP
TL;DR: We propose a framework that enables language agents to continually learn and adapt in complex environments by leveraging past experiences, thereby improving their decision-making.
Abstract: Large language model-based agents have shown their potential in decision-making tasks, such as web navigation. However, solving multi-step decision-making tasks in complex environments like websites often requires the acquisition of environment-specific experiences. Without continual learning of environment-specific knowledge, current methods often fail in these complex tasks. To address this, we propose Contextual Experience Replay (CER), a novel training-free framework to enable efficient continual learning for language agents through experience replay contextually, i.e. in their context window. CER is loosely inspired by experience replay in reinforcement learning, where the agent is trained with past experiences to do continual learning. Specifically, CER accumulates and synthesizes past experiences, which are represented as natural language summarizations and concrete trajectory examples, into a dynamic memory buffer. These experiences encompass environment dynamics and common decision-making patterns, allowing the agents to retrieve and augment themselves with relevant knowledge in new contexts, enhancing their adaptability in complex environments. We evaluate CER on the challenging WebArena and VisualWebArena benchmarks. While orthogonal to other methods, CER improves the GPT-4o agent baseline by a large margin and gets competitive results. On VisualWebArena, CER surpasses the tree search method with much lower token costs and achieves a state-of-the-art success rate of 31.9%. On WebArena, CER also gets a competitive average success rate of 33.16%, relatively improving the success rate of the GPT-4o agent baseline by 36.69%. CER shows that the continual learning of environment-specific knowledge is important and can lead to significant improvements in sequential decision-making tasks in complex environments.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11924
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview