Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models

ACL ARR 2025 May Submission2546 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: the performance of MoE-based LLMs depends on the router’s ability to select suitable experts; however, the router is typically not explicitly supervised to acquire this routing ability. We propose Exploration-Driven Reinforcement Learning (ERL), which explicitly optimizes the router by exploration of alternative routing paths. For every input, ERL evaluates by (i) the original routing path and (ii) paths in which an $\alpha$-fraction of routing decisions is randomly perturbed, and treats their performance gap as an advantage signal in a reinforcement learning. Moreover, MoE-ERL$_{wPL}$ mitigates the risk of performance collapse caused by routing reinforcement learning–induced expert over-specialization by intentionally enforcing overlap in experts’ knowledge. Without adding parameters or external reward models, our method improves summarization (SAMSum, XSUM), question answering (SQuAD), and language modeling (WikiText-2), and raises routing quality, delivering up to 8.9 × higher MRR than baselines over 100 perturbed routing paths. Code is available at our github.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Reinforcement Learning, Optimization Methods, Sparse Models
Contribution Types: NLP engineering experiment
Languages Studied: English
Keywords: Mixture-of-Experts, Reinforcement Learning, Router, Large Language Model
Submission Number: 2546
Loading