Efficient Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models

Shaohua Wu; Jiangang Luo; Tong Yu; Xi Chen; Shenling Wang; Xudong Zhao; Lingjun Li; Yue Wang; Fei Wang; houbo he

Efficient Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models

Shaohua Wu, Jiangang Luo, Tong Yu, Xi Chen, Shenling Wang, Xudong Zhao, Lingjun Li, Yue Wang, Fei Wang, houbo he

20 Sept 2025 (modified: 26 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Expert Pruning, MoE, LLM, Expert Loading balance

Abstract: While Mixture-of-Experts (MoE) Large Language Models (LLMs) achieve higher accuracy with fewer active parameters, their pre-training remains challenging due to the enormous parameter sizes and low training efficiency caused by imbalanced expert routing. Unlike previous expert pruning methods that focus on the post-training phase, this paper proposes an efficient Expert Pruning Algorithm (EPA) for the pre-training of MoE LLMs. This algorithm enhances training efficiency while preserving model accuracy by pruning underutilized experts and rearranging experts within expert parallel groups based on token distribution. Extensive experimental results demonstrate that EPA can significantly reduce model size and improve training efficiency while maintaining nearly unchanged accuracy. Specifically, a 1010B parameter MoE LLM trained from scratch using EPA exhibits substantial improvements in training efficiency and delivers excellent performance across tasks in various domains. The code and the 1010B model will be made publicly available.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2026/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 25312

Loading