AEL: Agent Evolving Learning for Open-Ended Environments

ACL ARR 2026 May Submission14140 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Agents
Abstract: LLM agents accumulate experience but rarely learn how to use it: which memories to retrieve, when retrieved evidence is misleading, and when the retrieval strategy itself should change. We introduce \emph{Agent Evolving Learning} (\ael{}), a two-timescale framework that recasts memory use as online policy selection. A fast Thompson Sampling bandit selects among memory-retrieval policies episode by episode, while slow LLM reflection follows a \emph{diagnose-before-prescribe} principle: it first explains why performance degraded, then injects a targeted new retrieval policy as a bandit arm when the current pool plateaus. \ael{} outperforms ten self-improving and non-LLM baselines on a sequential portfolio benchmark, lifting Sharpe by 27\% over the strongest memory-only variant with the lowest variance among all stochastic methods, and generalizes to a support-ticket routing stream, where it improves accuracy by 18\% over reflection-free Thompson Sampling and by 51\% over the best prior baseline. Mechanism studies further show that the gains are causal: reflection helps precisely when regimes demand different retrieval behavior, and is provably no-harm/no-gain when the best policy is stable.
Paper Type: Long
Research Area: LLM agents
Research Area Keywords: Evovling Learning, LLM Agents
Contribution Types: NLP engineering experiment
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 14140
Loading