RPATH: Explaining Time Series Mixture of Experts Routing via Ensemble Consensus and Structural Robustness

TMLR Paper6953 Authors

10 Jan 2026 (modified: 17 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mixture-of-Experts (MoE) architectures achieve strong performance in time series forecasting through sparse expert activation, but understanding \textit{why} specific experts are selected remains challenging. We present RPATH (Routing Pathway Analysis for Temporal Hierarchies), a post-hoc explainability framework for time series MoE models that combines temporal saliency mapping with counterfactual generation. Evaluating on Time-MoE-50M across 300 expert-sample pairs, we discover two properties of the routing architecture: (1) \textit{Ensemble Consensus}, where experts at different layers independently converge on the same critical temporal windows (mean saliency Intersection over Union (IoU) = 0.677), rather than developing distinct specializations; and (2) \textit{Structural Robustness}, characterized by a 300-fold ``Stability Gap'' where gentle perturbations alter routing in only 0.3\% of cases while aggressive perturbations succeed in 99.7\%, indicating that routing decisions reflect structural anchors rather than superficial signal characteristics. Together, these findings demonstrate that Time-MoE achieves reliable forecasting through \textit{Ensemble Redundancy}: multiple experts verify the same structural features, providing consensus that is insensitive to noise but responsive to fundamental signal changes. Our framework provides practitioners with tools to visualize expert attention, identify critical input regions, and quantify routing stability for deployed MoE models.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Christopher_Mutschler1
Submission Number: 6953
Loading