Chain-of-Experience for Continual LLM Improvement

ACL ARR 2026 January Submission8081 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: self-improve, learn from experience, language model, benchmarking
Abstract: Humans continuously learn from experience, whereas conventional large language model (LLM) evaluations ignore the models' ability to improve through inference-time interaction. In this paper, we study how LLMs learn from iterative experience at test time, a setting we refer to as Chain-of-Experience (CoE), analyzing how models improve across repeated attempts with feedback. Through iterative interactions with self or environmental feedback, models accumulate experiential traces that inform future problem solving. This forms a continual improvement loop beyond zero-shot inference. We instantiate CoE with diverse feedback mechanisms, including model self-feedback and environmental signals such as correctness or public coding test pass rates, and evaluate across math, coding, and knowledge domains using 7 LLMs, including GPT-5, Gemini-2.5 Pro, and Claude-4.5 Sonnet. Our study show that by leveraging iterative experience consistently outperforms feedback-free baselines, achieving substantial performance gains with self feedback alone, alongside a 5.6% overall improvement and 19% lower API cost across tasks and models. We further observe a positive correlation between the LLM base ability and its improvement capacity, and show that models can still improve under weak or spurious feedback, with different feedback contributing to distinct improvement aspects and most gains emerging early in the experience iterations.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Interpretability and Analysis of Models for NLP, Generation, Language Modeling
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: english
Submission Number: 8081
Loading