Reinforced Adaptive Routing for Mixture-of-expert Models

Reinforced Adaptive Routing for Mixture-of-expert Models

ICLR 2026 Conference Submission17369 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mixture-of-experts, Large Language Model, Adaptive Routing, Reinforcement Learning

Abstract: With the rapid development of large language models (LLMs), the mixture-of-experts (MoE) architecture attracts increasing attention due to its advantages in scaling capacity and enhancing performance. However, MoE requires activating multiple experts during training and inference, which introduces substantial computational and memory overhead. This makes acceleration essential in resource-constrained or latency-sensitive settings. Existing adaptive expert selection approaches often rely on heuristics or single-source supervision, lacking a unified formulation that simultaneously captures accuracy, balanced utilization, and efficiency. To address this, we propose a reinforcement learning–based adaptive routing approach that integrates a policy network into the standard MoE framework and optimizes expert selection end-to-end with a multi-objective reward. Experiments on benchmark datasets demonstrate that our approach substantially improves training efficiency while maintaining accuracy and promoting more balanced expert utilization.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17369

Loading