SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: agentic system, fault attribution, evaluation, long-horizon
TL;DR: We propose SAFARI, which enables high-precision fault attribution in agentic trajectories by using active investigation loop, offering a robust and high-precision alternative to reading long traces into LLM's context window for fault attribution.
Abstract: As autonomous agents tackle increasingly complex multi-step, multi-agent tasks, their execution trajectories have scaled beyond the constraints of even the largest context windows. Current methods for effectively diagnosing agent failures rely on loading the full agentic trajectory into a LLM's context window. This approach suffers from attention dilution and is not scalable when agentic traces inevitably grow outside of even the largest context limit one single LLM can have. To address this, we introduce SAFARI (Scaling long-horizon Agentic Fault AttRibution via active Investigation), a framework that replaces linear context loading with a tool-augmented diagnostic loop. By equipping LLMs with a specialized toolbox to read and search trajectory segments alongside a persistent Short-Term Memory (STM) for cross-turn reasoning, SAFARI effectively decouples diagnostic accuracy from architectural context limits. Our experiments demonstrate that SAFARI outperforms state-of-the-art results by 20\% on the Who\&When dataset within a 1M token budget, and by 19\% on TRAIL GAIA subset on a 25K token budget. Most significantly, SAFARI maintains a 0.58 precision even when the target fault resides 5x beyond the model’s native context window, a scenario where traditional evaluators fail entirely.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 121
Loading