DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Published: 01 Jun 2026, Last Modified: 03 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mixture-of-experts, end-side LLM, inference acceleration
TL;DR: A sparse MoE architecture that matches dense model performance, under the stringent constraints of identical total parameter budgets and training token counts.
Abstract: While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total parameter footprint creates significant storage and memory-access bottlenecks, which hinder efficient end-side deployment that simultaneously requires high performance, low computational cost, and small storage overhead. To achieve these properties, we present **DECO**, a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. DECO utilizes the differentiable and flexible ReLU-based routing enhanced by learnable expert-wise scaling, which adaptively balances the contributions of routed and shared experts. Furthermore, we introduce NormSiLU, an activation function that normalizes inputs prior to SiLU operators, producing a more stable trend of routed-expert activation ratio and a higher intrinsic sparsity level. We also identify an empirical advantage in using non-gated MLP experts with ReLU-based routing, indicating the possibility of MoE architecture simplification. Experiments demonstrate that DECO, activating only 20\% of experts, matches dense performance and outperforms established MoE baselines. Our specialized acceleration kernel delivers a 3.00$\times$ speedup on real hardware compared with dense inference. Codes and checkpoints are all available at https://github.com/thunlp/DECO.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 36
Loading