CoEvo-RL: Co-Evolving Policies and Experiential Memories for LLM Agents

CoEvo-RL: Co-Evolving Policies and Experiential Memories for LLM Agents

ACL ARR 2026 May Submission16360 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agent, Memory, Reinforcement Learning

Abstract: Experience-augmented LLM agents can learn from both parametric reinforcement learning and explicit trajectory-level memories, but existing methods often optimize the policy and the experience bank in isolation. This separation creates a mismatch: as the policy improves, earlier experiences may be internalized and lose marginal value, while a stale or independently updated bank may fail to provide the knowledge the current policy actually needs. We propose \ourwork{}, a policy-experience co-evolution framework that updates a GRPO-trained policy and an experiential memory bank concurrently. Across AppWorld, $\tau^2$-Bench, and LifelongAgentBench, \ourwork{} outperforms policy-only RL, experience-only learning, and other experience-augmented baselines, improving average success rate from $0.781$ to $0.820$. Further analyses show faster learning and persistent gains from refreshed experience, indicating that co-evolution improves both final performance and training dynamics.

Paper Type: Long

Research Area: LLM agents

Research Area Keywords: agent memory; reinforcement learning in agents

Contribution Types: NLP engineering experiment

Languages Studied: English

EMNLP 2026 AI Reviewing Experiment: no

Submission Number: 16360

Loading