Keywords: Time Series Forecasting
Abstract: Time series forecasting under limited data remains challenging due to model overfitting and insufficient structural regularization. In this work, we uncover a sparsity-oriented scaling phenomenon: as training data increases, model parameters naturally become sparser—even in simple linear models. This observation motivates the introduction of learned sparsity as an effective prior to improve model generalization under data-scarce regimes. We propose CrossSparse-MoE, a lightweight forecasting framework that enhances model expressiveness while promoting adaptive sparsity. Built upon a linear backbone, CrossSparse-MoE incorporates cross-channel convolutions to capture short-term inter-variable dependencies and employs a Mixture-of-Experts (MoE) module with non-linear MLPs. A learnable gating network dynamically routes temporal segments to specialized experts, while L1 regularization encourages parameter sparsity without imposing rigid structural constraints. Extensive experiments on multiple benchmarks demonstrate that CrossSparse-MoE consistently outperforms state-of-the-art baselines, particularly in low-data scenarios, validating the effectiveness of combining structural flexibility with learned sparsity.
Code is available in Appendix.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 4422
Loading