On the Impact of Expert Count in Mixture of Experts

Hoang Duy Dang; Ivan Au Yeung; Juntao Yang; Ka Chun Cheung; Simon See

On the Impact of Expert Count in Mixture of Experts

Hoang Duy Dang, Ivan Au Yeung, Juntao Yang, Ka Chun Cheung, Simon See

Published: 14 Feb 2026, Last Modified: 14 Feb 2026MATH4AI @ AAAI 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: mixture-of-expert, machine learning theory

TL;DR: We derive a range of expert counts that optimizes Mixture-of-Experts (MoE) performance and load balance

Abstract: Mixture-of-Experts (MoE) layers have achieved notable success across various deep learning applications. However, the impact of the number of experts on MoE performance across different task settings remains poorly understood. In this work, we investigate the impact of expert quantity within the MoE architecture composed of multilayer perceptron (MLP) experts. Concretely, we develop a formal MoE model with MLP experts, derive a range of expert counts that optimizes performance and load balance, and validate it on synthetic data. By systematically varying the number of experts, we demonstrate that balancing specialization and effective expert routing is key to maximizing performance.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 21

Loading