MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

ICLR 2026 Conference Submission11163 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, memory, agent, RLVR

TL;DR: We propose MemAgent, a novel agent workflow for long-text processing, demonstrating exceptional extrapolation and performance in large-scale tasks after RL Training.

Abstract: Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents without performance degradation during extrapolation remains the ultimate challenge in long-text processing. To solve this problem, We introduce a novel agent workflow, \method, which processes text in segments and updates memory through an overwrite strategy, addressing the challenge of long-context task through enhanced memory management. We further extend the DAPO algorithm to directly optimize memory ability in an end-to-end fashion, facilitating training via independent-context multi-conversation generation. Experimental results demonstrate that MemAgent has superb long-context capabilities, being able to extrapolate from an 8K context to a 3.5M QA task with a performance loss of less than 10\% and achieving over 95\% on the 512K NIAH test.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 11163

Loading