On the Impact of Expert Count in Mixture of Experts

Published: 14 Feb 2026, Last Modified: 14 Feb 2026MATH4AI @ AAAI 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: mixture-of-expert, machine learning theory
TL;DR: We derive a range of expert counts that optimizes Mixture-of-Experts (MoE) performance and load balance
Abstract: Mixture-of-Experts (MoE) layers have achieved notable success across various deep learning applications. However, the impact of the number of experts on MoE performance across different task settings remains poorly understood. In this work, we investigate the impact of expert quantity within the MoE architecture composed of multilayer perceptron (MLP) experts. Concretely, we develop a formal MoE model with MLP experts, derive a range of expert counts that optimizes performance and load balance, and validate it on synthetic data. By systematically varying the number of experts, we demonstrate that balancing specialization and effective expert routing is key to maximizing performance.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 21
Loading