Mixture of Heterogeneous Grouped Experts for Language Modeling

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: MoE, LLM, GPU Load Balance, Heterogeneous Experts
Abstract: Mixture-of-Experts (MoE) offers superior performance over dense models. However, current MoEs impose a critical limitation by enforcing uniform expert sizes, restricting the model's ability to dynamically match computational resources with token-specific requirements. Despite several attempts on heterogeneous experts have been made, they struggle either with limited performance and inefficient parameter utilization or unbalanced GPU utilization, there is still a lack of general heterogeneous MoE architecture. To this end, we present Mixture of Heterogeneous Grouped Experts (MoHGE), an innovative MoE architecture that introduces a two-level routing mechanism and enables more nuanced and efficient expert selection tailored to each input token's characteristics. We also propose a Group-Wise Auxiliary Loss to enhance efficient parameter utilization without compromising model performance. To address the resulted workload imbalance challenges, we develop: (1) an All-size Group-decoupling Allocation strategy and (2) Intra-Group Experts Auxiliary Loss, collectively ensuring balanced GPU utilization. Extensive evaluations on multiple benchmarks demonstrate that MoHGE achieves comparable performance to state-of-the-art MoE architectures while reducing total parameter count by approximately 20\% and maintaining balanced GPU utilization. Our work establishes a new paradigm for resource-aware MoE design, better aligning computational allocation with actual inference demands.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10611
Loading