ScaleMoR: Multi-Scale Mixture of Recursive Linear Experts for Time Series Forecasting

Yan ZHANG; Charmayne Mary Lee Hughes

ScaleMoR: Multi-Scale Mixture of Recursive Linear Experts for Time Series Forecasting

Yan ZHANG, Charmayne Mary Lee Hughes

03 Sept 2025 (modified: 25 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time Series Forecasting, Mixture of Experts, Recursive Neural Networks, Multi-Scale Representation

Abstract: Multivariate time-series forecasting across multiple horizons faces two major challenges: temporal misalignment when aggregating multi-scale representations and inefficient uniform computation allocation regardless of sequence complexity. Existing methods either lose temporal dependencies during multi-scale fusion or allocate computation uniformly, ignoring varying input characteristics, which reduces long-range forecasting performance. We propose ScaleMoR, a novel architecture that redefines mixture-of-experts for temporal modeling. ScaleMoR applies recursive, scale-specific linear transformations as "experts", enabling parameter-efficient conditional computation. The method introduces three key innovations: (1) temporal-aligned multi-scale tokenization, which preserves chronological consistency across fine (2-step), medium (6-step), coarse (12-step), and macro (24-step) windows using learned Gaussian-weighted interpolation, (2) multi-dimensional complexity routing, which dynamically allocates computation based on trend, seasonal, noise, and volatility characteristics instead of a single complexity measure; and (3) hierarchical recursive modules, where deeper layers employ SwiGLU gating and dilated convolutions, achieving progressively richer representations through linear operations alone. We adopt a progressive three-phase training strategy that first learns tokenization, then introduces routing with entropy regularization, and finally optimizes the full architecture. Across ten benchmark datasets and multiple forecast horizons, ScaleMoR consistently outperforms state-of-the-art models, with particularly strong gains on long-range prediction tasks. It delivers 75–85% fewer parameters and 96–99% fewer FLOPs compared to recent attention-based and clustering-based approaches, while maintaining or surpassing their accuracy. These results highlight ScaleMoR as a highly accurate and efficient solution for multivariate time-series forecasting, well suited to real-world domains such as finance, energy, and industrial monitoring.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 1786

Loading