Self-Sum: Teaching Agent Itself to Decide When and What to Summarize

ACL ARR 2026 January Submission4874 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Long-horizon Agents, Context Management
Abstract: Long-horizon agents operate over extended sequences of reasoning and actions, but this inevitably accumulates context noise, resulting in excessive computational cost and information overload. Existing approaches commonly rely on fixed, rule-based summarization strategies (e.g., summarizing every few steps), which are inflexible, lack generalization, and often introduce irreversible information loss. We propose *Self-Sum*, a framework that empowers agents to autonomously decide when and what to summarize by modeling summarization as a first-class internal cognitive action, unified with external environmental actions within a multi-turn decision-making process. Specifically, we introduce a two-stage training recipe consisting of (i) a cold-start supervised fine-tuning stage that bootstraps summarization behavior, and (ii) a lightweight, summarization-aware reinforcement learning stage that refines summarization timing and content while discouraging unnecessary summaries. Experiments on multiple long-horizon benchmarks show that *Self-Sum* consistently outperforms no-summarization and rule-based baselines, with particularly strong gains in generalization. Analysis further reveals that *Self-Sum* learns to summarize sparsely at meaningful moments and preserves task-relevant information, highlighting the importance of jointly learning when and what to summarize for robust long-horizon agent behavior.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM agents, environment interaction, reinforcement learning in agents
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4874
Loading