On Training Mixture-of-Experts: A Social Choice Perspective

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mixture-of-Experts, Social Choice Theory, Arrow's Impossibility Theorem, Curriculum Learning, Expert Specialization
TL;DR: Attributing the MoE training difficulty to Arrow’s Impossibility Theorem through a social choice perspective, we introduce RMoE, a framework featuring phased curriculum and stateful fusion.
Abstract: Mixture-of-Experts (MoE) training faces a dilemma between expert specialization and balanced computation. We recast this problem through the lens of social choice theory, attributing training difficulties to Arrow's Impossibility Theorem. Inspired by this, we propose Regulated Mixture-of-Experts (RMoE), comprising a phased curriculum for load-balancing and stateful fusion for expert weighting. Experiments on GLUE and DomainBed show RMoE significantly outperforms standard MoE and dynamic routing baselines. Furthermore, RMoE demonstrates strong scalability on large-scale reasoning tasks with Qwen3 and Mixtral architectures. Our code is available at https://anonymous.4open.science/r/R-MoE-E3DC.
Primary Area: learning theory
Submission Number: 22697
Loading