Supervised Routing for MoE: Aligning Experts with their Knowledge

Supervised Routing for MoE: Aligning Experts with their Knowledge

ACL ARR 2025 February Submission1010 Authors

12 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Mixture-of-Experts models have low computational cost despite having a large number of parameters. However, the problem of unbalanced expert selection during routing leads to inefficient use of parameters. Thus, an auxiliary loss is used to make the expert selection uniform, but it has been found that this interferes with the performance of the language model. In this study, we propose a supervised learning approach to Mixture-of-Experts routing using token frequencies as the supervised signal. This method aims to align the expert selection with the knowledge they have acquired. As a case study, we focus on domain adaptation for law. The proposed method without the auxiliary loss achieved results comparable to the baseline with the auxiliary loss.

Paper Type: Short

Research Area: Language Modeling

Research Area Keywords: continual learning, fine-tuning

Contribution Types: NLP engineering experiment

Languages Studied: Japanese

Submission Number: 1010

Loading