Speculative Actions: A Lossless Framework for Faster AI Agents

ICLR 2026 Conference Submission22399 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Agents, Speculative Decoding, Parallel Execution, Agentic Serving, Agentic Simulation
TL;DR: We introduce speculative actions—a lossless framework that predicts likely actions using faster models, enabling multiple API calls to be executed in parallel and thus yields substantial acceleration.
Abstract: AI agents have attracted growing interest across industry and academia, but in practice their execution can be slow. For example, letting two state-of-the-art agents play a game of chess may take hours. A key bottleneck is that agent behavior unfolds sequentially: each action requires an API call, and these calls can be time-consuming. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, we propose speculative actions—a lossless framework that predicts likely actions using faster models, enabling multiple API calls to be executed in parallel. We evaluate this framework across four agentic environments: gaming, e-commerce, web search, and operating systems. In all cases, speculative actions yield substantial acceleration, with potential speedups of up to 30%. Moreover, performance can be further improved through stronger guessing models and top-K action prediction, opening a promising path toward real world, efficient deployment of AI agents.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 22399
Loading