What Limits Agentic Systems Efficiency?

Song Bian; Minghao Yan; Anand Jayarajan; Gennady Pekhimenko; Shivaram Venkataraman

What Limits Agentic Systems Efficiency?

Song Bian, Minghao Yan, Anand Jayarajan, Gennady Pekhimenko, Shivaram Venkataraman

Published: 28 Sept 2025, Last Modified: 17 Oct 2025SEA @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficiency; Agentic Systems

Abstract: Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated strong reasoning capabilities. To further enhance this process, recent agentic systems, such as Deep Research, incorporate web interactions into LLM reasoning to mitigate uncertainties and reduce potential errors. However, existing research predominantly focuses on reasoning performance, often neglecting the efficiency of these systems. In this work, we present a comprehensive empirical study that identifies efficiency bottlenecks in web-interactive agentic systems. We decompose end-to-end latency into two primary components: LLM API latency and web environment latency. Our findings show that both components significantly contribute to the overall system latency. To improve latency, we propose SpecCache, a caching framework augmented with speculative execution to reduce web environment overhead. Extensive evaluations on two standard benchmarks show that our approach improves the cache hit rate by up to $54\times$ compared to a random caching strategy, while reducing web environment overhead by up to $3.2\times$, without degrading agentic system performance.

Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.

Submission Number: 22

Loading