Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling

Published: 02 Mar 2026, Last Modified: 02 Mar 2026MALGAIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Systems, Large Language Model
TL;DR: We propose Team of Thoughts, enabling heterogeneous agents to collaborate efficiently through dynamic orchestrator selection and parallel tool agents execution, achieving superior task performance and cost efficiency.
Abstract: Existing Multi-Agent Systems (MAS) typically rely on static, homogeneous model configurations, limiting their ability to exploit the distinct strengths of differently post-trained models. To address this, we introduce \textbf{Team-of-Thoughts}, a novel MAS architecture that leverages the complementary capabilities of heterogeneous agents via an orchestrator-tool paradigm. Our framework introduces two key mechanisms to optimize performance: (1) an orchestrator calibration scheme that identifies models with superior coordination capabilities, and (2) a self-assessment protocol where tool agents profile their own domain expertise to account for variations in post-training skills. During inference, the orchestrator dynamically activates the most suitable tool agents based on these proficiency profiles. Experiments across five reasoning and code generation benchmarks demonstrate that Team-of-Thoughts achieves a superior accuracy-cost trade-off. Notably, on AIME24, our method reaches 96.67\% accuracy, significantly surpassing homogeneous role-play baselines (80\%) while reducing computational costs by two orders of magnitude.
Submission Number: 69
Loading