AGENT*: Optimizing Test-Time Compute for Multi-Agent Systems with Modularized Collaboration

ICLR 2026 Conference Submission21185 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent System, Test-time Scaling
Abstract: Scaling test-time computation has emerged as a powerful and increasingly popular approach for improving the performance of large language models without additional training. Recent work demonstrates that techniques such as repeated sampling, self-verification, and self-reflection can significantly enhance task success by allocating more inference-time compute. However, applying these techniques directly to multi-agent systems is challenging, as they provide no principled way to encourage collaboration or manage compute allocation across multiple agents under budget constraints. To address this, we propose AGENT*, a general framework for enabling effective multi-agent collaboration while operating within strict compute budgets. AGENT* introduces the notion of \emph{modularized collaboration}, formalized as callable functions that encapsulate reusable multi-agent workflows, automatically constructed via self-play reflection by abstracting recurring interaction patterns from past trajectories. Building on these collaboration modules, AGENT* proposes \emph{a dual-level planning architecture} that optimizes compute allocation by reasoning over the current task state while also \emph{speculating} on future steps. Experiments on complex agent benchmarks demonstrate that AGENT* consistently outperforms baselines across diverse budget settings, validating its effectiveness for multi-agent collaboration in inference-time optimization.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21185
Loading