ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Shangqian Gao; Ting Hua; Reza Shirkavand; Chi-Heng Lin; Zheng Tang; Zhengao Li; Longge Yuan; Fangyi Li; Zeyu Zhang; Alireza Ganjdanesh; Qian Lou; Jie Xu; Yen-Chang Hsu

ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zheng Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Qian Lou, Jie Xu, Yen-Chang Hsu

Published: 06 Jan 2026, Last Modified: 06 Jan 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities but face deployment challenges due to their high computational demands. Traditional pruning methods reduce these costs by permanently removing parameters, which inevitably leads to performance degradation. To mitigate this issue, we propose ToMoE, a method that transforms dense LLMs into Mixture-of-Experts (MoE) models by uncovering experts inherently present within dense models, without requiring any weight updates. ToMoE leverages dynamic structural pruning to unify expert construction and router training in a single stage, achieving consistently strong performance. Remarkably, even without fine-tuning \revise{the model weights}, ToMoE consistently outperforms state-of-the-art pruning and MoE techniques across Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5 models. The code for this paper is available at https://github.com/gaosh/ToMoE.

Certifications: J2C Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/gaosh/ToMoE

Assigned Action Editor: ~Yu_Cheng1

Submission Number: 5955

Loading