ZipMoE: A Theoretically-Grounded Mixture of Experts Approach forParameter-Efficient Deep Learning

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The relentless growth of large language models (LLMs) presents formidable challenges for their training and deployment. To address this critical bottleneck, we introduce ZipMoE, a novel family of parameter-efficient building blocks inspired by the Mixture of Experts (MoE) paradigm. ZipMoE offers a modular and highly efficient substitute for conventional fully connected layers. We provide a rigorous theoretical analysis of ZipMoE's expressiveness, formally demonstrating its superior representational capacity over low-rank factorization. Furthermore, in a least squares regression setting, we prove that ZipMoE achieves a lower test error bound. Our empirical results—featuring comprehensive comparisons against low-rank, Monarch, and Kronecker methods—corroborate these theoretical findings. We demonstrate that ZipMoE consistently attains superior model quality under equivalent parameter or FLOP budgets, establishing it as a potent component for building efficient and powerful deep learning architectures.
Code Dataset Promise: Yes
Code Dataset Url: https://github.com/lc-research-archive/ZipMoE
Signed Copyright Form: pdf
Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.
Submission Number: 1770
Loading