INFMEM: Learning System-2 Memory Control for Long-Context Agent

Xinyu Wang; Mingze Li; Peng Lu; Xiao-Wen Chang; Lifeng Shang; Jinpeng Li; Fei Mi; Prasanna Parthasarathi; Yufei Cui

INFMEM: Learning System-2 Memory Control for Long-Context Agent

Xinyu Wang, Mingze Li, Peng Lu, Xiao-Wen Chang, Lifeng Shang, Jinpeng Li, Fei Mi, Prasanna Parthasarathi, Yufei Cui

Published: 03 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop MemAgents OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Long Context; Question and Answer; Memory Compression

TL;DR: We propose InfMem, a bounded-memory agent trained via RL to exercise System-2 control (PreThink-Retrieve-Write) over long documents, significantly improving multi-hop QA accuracy while reducing inference cost through early stopping.

Abstract: Reasoning over ultra-long documents requires synthesizing sparse evidence scattered across distant segments under strict memory constraints. While streaming agents enable scalable processing, their passive memory update strategy often fails to preserve low-salience bridging evidence required for multi-hop reasoning. We propose InfMem, a control-centric agent that instantiates System-2-style control via a PreThink–Retrieve–Write protocol. InfMem actively monitors evidence sufficiency, performs targeted in-document retrieval, and applies evidence-aware joint compression to update a bounded memory. To ensure reliable control, we introduce a practical SFT→RL training recipe that aligns retrieval, writing, and stopping decisions with end-task correctness. On ultra-long QA benchmarks from 32k to 1M tokens, InfMem consistently outperforms MemAgent across backbones. Specifically, InfMem improves average absolute accuracy by +10.17, +11.84, and +8.23 points on Qwen3-1.7B, Qwen3-4B, and Qwen2.5-7B, respectively, while reducing inference time by 3.9× on average (up to 5.1×) via adaptive early stopping.

Submission Number: 78

Loading