On Training Mixture-of-Experts: A Social Choice Perspective

ICLR 2026 Conference Submission22697 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mixture-of-Experts, Social Choice Theory, Arrow's Impossibility Theorem, Curriculum Learning, Expert Specialization
TL;DR: Attributing the MoE training difficulty to Arrow’s Impossibility Theorem through a social choice perspective, we introduce RMoE, a framework featuring phased curriculum and stateful fusion.
Abstract: Mixture-of-Experts (MoE) training is notoriously difficult, caught between fostering expert specialization and ensuring balanced computation. We introduce a novel perspective to recast MoE training through the lens of social choice. From this perspective, we attribute the training difficulty to Arrow's Impossibility Theorem. Inspired by principles from social choice theory, we then present a novel approach, Regulated Mixture-of-Experts (RMoE), to alleviate these training difficulties. RMoE consists of a phased curriculum for the load-balancing loss and stateful fusion for expert weighting. Extensive experiments on the GLUE and DomainBed benchmarks show that RMoE significantly outperforms standard MoE and dynamic routing baselines. Our work provides a new lens for understanding MoE training and offers a practical framework for building more stable and powerful models. Our code is available at https://anonymous.4open.science/r/R-MoE-E3DC.
Primary Area: learning theory
Submission Number: 22697
Loading