COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Context Engineering, Multi-Agent systems, Long Horizon Tasks, LLM Agents
TL;DR: We introduce COMPASS, a dual-loop multi-agent framework that uses context management and strategic oversight to make LLM agents reliable on long-horizon tasks.
Abstract: Long-horizon tasks requiring many rounds of reasoning and tool use remain challenging for LLM agents, as small mistakes compound across steps and even state-of-the-art models could produce unexpected or hallucinated tool outputs. We identify ineffective context management as the core bottleneck: as execution unfolds, unstructured histories cause agents to overlook critical evidence or become overwhelmed by irrelevant information. To address this, we introduce COMPASS (Context-Organized Multi-Agent Planning and Strategy System), a lightweight hierarchical framework that separates tactical execution, strategic oversight, and context management into three specialized components: (1) a Main Agent that executes reasoning and tool calls, (2) a Meta-Thinker that monitors execution and issues strategic signals, and (3) a Context Manager that maintains concise, strategically relevant summaries. This design preserves single-agent fluidity while enabling adaptive context organization throughout execution. Across three challenging benchmarks—GAIA, BrowseComp, and Humanity's Last Exam—COMPASS improves accuracy by over 10% compared to both single- and multi-agent baselines, with ablation studies confirming designed components as crucial for long-horizon reasoning, test-time scaling extensions that boost performance by up to 20% (matching established DeepResearch Agents), and a post-training optimization pipeline improving token efficiency by 25%.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15727
Loading