ZipMoE: A Theoretically-Grounded Mixture of Experts Approach forParameter-Efficient Deep Learning

Lin Chen; Kyriakos Axiotis; Gang Fu; Kaiyuan Wang; Mohammadhossein Bateni; Vahab Mirrokni

ZipMoE: A Theoretically-Grounded Mixture of Experts Approach forParameter-Efficient Deep Learning

Lin Chen, Kyriakos Axiotis, Gang Fu, Kaiyuan Wang, Mohammadhossein Bateni, Vahab Mirrokni

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The growing size of large language models (LLMs) presents significant challenges to their efficient training and deployment. To address this, we introduce ZipMoE, a novel family of parameter-efficient building blocks inspired by the Mixture of Experts (MoE) paradigm. ZipMoE provides a modular and efficient alternative to traditional fully connected layers. We theoretically analyze the expressiveness of ZipMoE, demonstrating its advantage over low-rank factorization in terms of representational capacity and test error in a least squares regression setting. Empirical results, including comparisons with low-rank, Monarch, and Kronecker methods, confirm that ZipMoE achieves superior model quality under equivalent parameter or FLOP budgets, highlighting its effectiveness as a parameter-efficient building block for LLMs.

Submission Number: 1770

Loading