ZipMoE: A Theoretically-Grounded Mixture of Experts Approach forParameter-Efficient Deep Learning
Abstract: The growing size of large language models (LLMs) presents significant challenges to their efficient training and deployment. To address this, we introduce ZipMoE, a novel family of parameter-efficient building blocks inspired by the Mixture of Experts (MoE) paradigm. ZipMoE provides a modular and efficient alternative to traditional fully connected layers. We theoretically analyze the expressiveness of ZipMoE, demonstrating its advantage over low-rank factorization in terms of representational capacity and test error in a least squares regression setting. Empirical results, including comparisons with low-rank, Monarch, and Kronecker methods, confirm that ZipMoE achieves superior model quality under equivalent parameter or FLOP budgets, highlighting its effectiveness as a parameter-efficient building block for LLMs.
Submission Number: 1770
Loading