Mixture of Experts for Image Classification: What's the Sweet Spot?

TMLR Paper5177 Authors

22 Jun 2025 (modified: 26 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across domains. However, their application to image classification remains limited, often requiring billion-scale datasets to be competitive. In this work, we explore the integration of MoE layers into image classification architectures using open datasets. We conduct a systematic analysis across different MoE configurations and model scales. We find that moderate parameter activation per sample provides the best trade-off between performance and efficiency. However, as the number of activated parameters increases, the benefits of MoE diminish. Our findings offer practical guidance for efficient model design using MoE for image classification tasks.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Weijian_Deng1
Submission Number: 5177
Loading