MoE Lens - An Expert Is All You Need

Published: 05 Mar 2025, Last Modified: 05 Mar 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: Mixture-of-Experts, Interpretability, Sparsity, Early Decoding, DeepSeek
TL;DR: We study expert specialization in DeepSeekMoE and find that a few specialized experts can effectively approximate the full model's performance indicating potential for inference improvements.
Abstract: Mixture of Experts (MoE) models enable parameter-efficient scaling through sparse expert activations, yet optimizing their inference and memory costs remains challenging due to limited understanding of their specialization behavior. We present a systematic analysis of expert specialization in MoEs through two complementary approaches: domain-specific routing patterns and an early decoding framework that tracks expert contributions to output representations. Our analysis of the DeepSeekMoE model reveals that despite having 64 routed experts with 6 active for each layer's computation, the model predominantly relies on a few specialized experts, with the top-weighted expert's output closely approximating the full ensemble prediction. We quantitatively validate these findings through a systematic analysis of the token routing distribution, demonstrating that very few experts handle over 50\% of routing decisions across English, French, and Code domains. Hidden state similarity between single and ensemble experts for every layer is extremely high, with some layers having cosine similarity as high as 0.95 and perplexity increasing by only 5\% when using a single expert across all three domains.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Marmik_Chaudhari1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 86
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview