Accelerating Dense LLMs via L0-regularized Mixture-of-Experts

Zhenyu Zhang, JiuDong Yang, Zhaowen Tao, Meng Chen

Published: 2025, Last Modified: 04 Nov 2025ACL (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large language models (LLMs) achieve strong performance but suffer from slow and costly inference. Existing acceleration methods often lead to noticeable performance degradation, while Mixture-of-Experts (MoE) models require extensive computational resources. In this paper, we propose L0-MoE, a lightweight MoE approach using L0-regularization to accelerate dense LLMs nearly without performance loss. Our method introduces a cluster confusion matrix for domain-aware dataset curation and applies dynamic batching for efficient training. Experiments show that L0-MoE achieves up to 2.5x speedup over dense models while maintaining competitive performance, outperforming existing LLM acceleration baselines.

External IDs:dblp:conf/acl/ZhangYT025