Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting
Abstract: Recent advances in deep learning have significantly boosted performance in multivariate time series forecasting (MTSF). While many existing approaches focus on capturing inter-variable (a.k.a. channel-wise) correlations to improve prediction accuracy, the temporal dimension, particularly its rich structural and contextual information, remains underexplored. In this paper, we propose BIM3, a novel framework that integrates BIdirectional temporal-aware modeling with Multi-Scale Mixture-of-Experts for MTSF. First, unlike existing methods that treat historical and future temporal information independently, we introduce a novel Timestamp Dual Cross-Attention Module, which employs a symmetric cross-attention mechanism to explicitly capture bidirectional temporal dependencies through timestamp interactions. Second, to address the complex and scale-varying temporal patterns commonly found in multivariate time series, we move beyond recent multi-scale forecasting models that share parameters across all channels and fail to capture channel-specific dynamics. Instead, we design a Multi-Scale Feature Extract Mixture-of-Experts module that adaptively routes time series to specialized experts based on their temporal characteristics. Extensive experiments on multiple real-world datasets show that BIM3 consistently outperforms state-of-the-art methods, highlighting its effectiveness in capturing both temporal structure and inter-variable diversity.
External IDs:doi:10.1145/3746252.3761273
Loading