Uncertainty-Aware Routing for Principled Alignment with MoE Dynamics

ACL ARR 2026 January Submission8440 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Mixuture of Expert, Entropy, Efficiency
Abstract: Mixture-of-Experts (MoE) is a cornerstone for scaling LLMs, yet its training dynamics remain poorly understood, often leading to sub-optimal specialization. Moving beyond static routing, we present a systematic study of the MoE lifecycle using Helmholtz Free Energy}and Router Entropy. We identify a universal Three-Stage Phase Transition—Exploration, Symmetry Breaking, and Stabilization—marked by an Energy ``Climb'' and Plateau. This reflects Frustrated Exploration, caused by structural interference between specialization drives and uniformity constraints. To address this, we propose Uncertainty-Aware Routing (UAR), which aligns routing with the model’s epistemic state via: (1) Evidence-Triggered Expansion, increasing active experts for high-energy tokens, and (2) Epistemic Masking, applying load-balancing only in high-uncertainty regimes to shield mature experts. Experiments confirm UAR reduces perplexity and improves expert distinctiveness, offering a principled path toward thermodynamically aligned computation.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: uncertainty, sparse models, LLM Efficiency
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 8440
Loading