MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

Published: 06 Jan 2026, Last Modified: 09 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and can introduce destabilizing weight updates, increasing the risk of catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on Humanity's Last Exam, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, suggesting that MemRL offers a practical way to balance stability and plasticity during runtime improvement.
Loading