Lost in the Maze: Overcoming Context Limitations in Long- Horizon Agentic Search

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: search agents, context management, long-horizon
Abstract: Long-horizon agentic search requires iteratively exploring the web over long trajectories and synthesizing information across many sources, enabling powerful applications like deep research systems. In this work, we show that popular agentic search frameworks struggle to scale to long trajectories primarily due to context limitations—they accumulate long, noisy content, hit context window and tool budgets, or stop early. Then, we introduce SLIM (Simple Lightweight Information Management), a simple framework that separates retrieval into distinct search and browse tools, and periodically summarizes the trajectory, keeping context concise while enabling longer, more focused searches. Across a wide range of long-horizon tasks, SLIM achieves comparable performance at substantially lower cost and far fewer tool calls than strong open-source frameworks with both proprietary and open-weight models, including RL-trained models for deep research. Specifically, with o3 as the base model, SLIM achieves 56% on BrowseComp and 33% on HLE, outperforming all open-source frameworks by 8 and 6 absolute points, respectively, while incurring 4–6x fewer tool calls. With GLM-4.7 Flash, SLIM achieves 10 points improvement over the next best open-source framework, Search-o1, on BrowseComp using a third of the cost. To systematically understand failure modes in long-horizon agentic search, we develop an automated fine-grained trajectory analysis pipeline and error taxonomy, and find that SLIM exhibits significantly fewer hallucinations than prior systems. We hope our analysis framework and simple tool design inform future long-horizon agents.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 55
Loading