Keywords: Agent, Memory, Reinforcement Learning
Abstract: Experience-augmented LLM agents can learn from both parametric reinforcement learning and explicit trajectory-level memories, but existing methods often optimize the policy and the experience bank in isolation. This separation creates a mismatch: as the policy improves, earlier experiences may be internalized and lose marginal value, while a stale or independently updated bank may fail to provide the knowledge the current policy actually needs. We propose \ourwork{}, a policy-experience co-evolution framework that updates a GRPO-trained policy and an experiential memory bank concurrently. Across AppWorld, $\tau^2$-Bench, and LifelongAgentBench, \ourwork{} outperforms policy-only RL, experience-only learning, and other experience-augmented baselines, improving average success rate from $0.781$ to $0.820$. Further analyses show faster learning and persistent gains from refreshed experience, indicating that co-evolution improves both final performance and training dynamics.
Paper Type: Long
Research Area: LLM agents
Research Area Keywords: agent memory; reinforcement learning in agents
Contribution Types: NLP engineering experiment
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: no
Submission Number: 16360
Loading