Keywords: Workflow Optimization, Agent Reasoning, Context Management
TL;DR: This paper proposes end-to-end CSIM method for LLM agent context management: it compresses post-step context, updates plans to avoid forgetting, outperforms baselines, and boosts multi-tool scenario performance to solve long-context issues.
Abstract: Large Language Model (LLM) agents excel in tasks like translation, code generation, and decision-making, but consecutive tool calls in complex scenarios lead to excessively long contexts. Despite SOTA LLMs’ 128K+ token context windows, unstructured data interactions easily exceed limits, harming task focus and increasing resource costs.
Existing solutions have flaws: forced truncation causes information loss, external memory modules lack end-to-end optimization, and context summarization wastes KV cache and loses data.
To address this, we propose Compressed Step Information Memory (CSIM), an end-to-end context management method. It compresses post-step context to minimize information loss, retells/updates plans to avoid forgetting and correct errors. Trained via SFT and RL, CSIM achieves strong performance on Gaia and Browsecomp.
Our contributions: (1) CSIM boosts performance in multi-tool scenarios; (2) A data synthesis and SFT/RL framework distills SOTA agent capabilities; (3) Experiments validate the method on multiple benchmarks.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: true
Submission Guidelines: true
Anonymous Url: true
No Acknowledgement Section: true
Submission Number: 17735
Loading