On the Role of Discrete Representation in Sparse Mixture of Experts

Giang Do; Kha Pham; Hung Le; Truyen Tran

On the Role of Discrete Representation in Sparse Mixture of Experts

Giang Do, Kha Pham, Hung Le, Truyen Tran

Published: 20 Aug 2025, Last Modified: 20 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Sparse Mixture of Experts (SMoE) is an effective solution for scaling up model capacity without increasing the computational costs. A crucial component of SMoE is the router, responsible for directing the input to relevant experts; however, it also presents a major weakness, leading to routing inconsistencies and representation collapse issues. Instead of fixing the router like previous works, we propose an alternative that assigns experts to input via \emph{indirection}, which employs the discrete representation of input that points to the expert. The discrete representations are learned via vector quantization, resulting in a new architecture dubbed Vector-Quantized Mixture of Experts (VQMoE). We provide theoretical support and empirical evidence demonstrating the VQMoE's ability to overcome the challenges present in traditional routers. Through extensive evaluations on both large language models and vision tasks for pre-training and fine-tuning, we show that VQMoE achieves a 28\% improvement in robustness compared to other SMoE routing methods while maintaining strong performance in fine-tuning tasks.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=GiIgloq4ng&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)

Changes Since Last Submission: We have updated to the camera ready version format.

Video: https://drive.google.com/file/d/1P8Lfl0PapAhI38URcqIwWCXVxP8Ci4gu/view?usp=sharing

Code: https://github.com/giangdip2410/VQMoE

Assigned Action Editor: ~Naigang_Wang1

Submission Number: 4954

Loading