Orchestrated Sparse Consortium of Small Experts Beats Monolithic LLMs

Orchestrated Sparse Consortium of Small Experts Beats Monolithic LLMs

ICLR 2026 Conference Submission16393 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-expert systems, Small Language Models, Model Coordination

Abstract: Large Language Models (LLMs) attain impressive capabilities but demand heavy computation and offer limited transparency. Naively shrinking a model reduces computational overhead yet typically sacrifices breadth and performance; we therefore pursue a different axis: keep models modular and *scale up* by coordinating multiple experts such that a small, task-adaptive subset collaborates per input and can achieve better performance. In this paper, we introduce **FOCUS** (*Flexible Orchestration and Collaboration Using Specialists*) -- a *generic* multi-expert collaboration framework that trains a lightweight *orchestrator* under *oracle* supervision to *select, order,* and *coordinate* a consortium of *experts* (homogeneous/heterogeneous language models of any size). A learnable sparse, near-symmetric *collaboration matrix* governs information flow among experts, and a *multi-round refinement* process aggregates intermediate outputs into a single answer; the oracle is used only during training, not at test time. At test time, the orchestrator adaptively routes experts with early stopping, achieving *sublinear cost growth* as consortium size increases. **FOCUS** achieves striking results: on MMLU, GSM8K, and HumanEval, a consortium of 5–7 Qwen experts (combined ~9B parameters) reaches 94.1%, 94.1%, and 87.8% accuracy, respectively, matching or surpassing a Qwen3-14B model by an average margin of 7.6%. On reasoning benchmarks, a consortium of 5 Phi-4-Mini models improves AIME-2024 from 26% to 40% and GPQA-DIAMOND from 19% to 31%, and attains 92% on MATH-500, exceeding a single Phi-4-14B reasoning model. These results establish *collaboration* as a distinct axis of scaling: carefully orchestrated experts can outperform comparable-size monolithic models while remaining modular and cost-effective for deployment.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16393

Loading