Astrocyte-Inspired Hierarchical Routing for Enhanced Expert Specialization in Mixture-of-Experts Models

TMLR Paper6399 Authors

06 Nov 2025 (modified: 13 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Mixture-of-Experts (MoE) architecture is a leading paradigm for scaling, but cultivating genuine expert specialization is a persistent challenge, often hindered by load balancing. This paper introduces Astrocyte-Hierarchical Routing (AHR), a novel, bio-inspired mechanism that addresses this challenge. Drawing inspiration from astrocytes, AHR conditions local, token-level routing decisions on a global context signal. In our encoder-based implementation, this signal, derived from the [CLS] token, additively biases local routing decisions, promoting a developmental trajectory for expert functionality. We conduct experiments on a multi-class text classification task, comparing AHR against strong baselines. The results demonstrate that AHR achieves a statistically significant and substantial increase in final-layer expert specialization without incurring a discernible loss in task performance. Qualitative analysis further confirms that AHR fosters a transition from generalist experts in early layers to highly specialized experts in later layers. This work presents a new principle for MoE router design: a contextual, two-level approach. This successful validation in an encoder model serves as a proof-of-concept, opening the way for future work on scaling AHR and adapting its principle to other architectures.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Stefano_Sarao_Mannelli1
Submission Number: 6399
Loading