CrossSparse-MoE: Adaptive Sparsity and Cross-Channel Expert Routing for Time Series Forecasting

Xiyu Meng; Yuhan Wu; Yabo Dong; Duanqing Xu

CrossSparse-MoE: Adaptive Sparsity and Cross-Channel Expert Routing for Time Series Forecasting

Xiyu Meng, Yuhan Wu, Yabo Dong, Duanqing Xu

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time Series Forecasting

Abstract: Time series forecasting under limited data remains challenging due to model overfitting and insufficient structural regularization. In this work, we uncover a sparsity-oriented scaling phenomenon: as training data increases, model parameters naturally become sparser—even in simple linear models. This observation motivates the introduction of learned sparsity as an effective prior to improve model generalization under data-scarce regimes. We propose CrossSparse-MoE, a lightweight forecasting framework that enhances model expressiveness while promoting adaptive sparsity. Built upon a linear backbone, CrossSparse-MoE incorporates cross-channel convolutions to capture short-term inter-variable dependencies and employs a Mixture-of-Experts (MoE) module with non-linear MLPs. A learnable gating network dynamically routes temporal segments to specialized experts, while L1 regularization encourages parameter sparsity without imposing rigid structural constraints. Extensive experiments on multiple benchmarks demonstrate that CrossSparse-MoE consistently outperforms state-of-the-art baselines, particularly in low-data scenarios, validating the effectiveness of combining structural flexibility with learned sparsity. Code is available in Appendix.

Supplementary Material: zip

Primary Area: learning on time series and dynamical systems

Submission Number: 4422

Loading