Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting

Yifan Gao, Boming Zhao, Haocheng Peng, Hujun Bao, Jiashu Zhao, Zhaopeng Cui

Published: 10 Nov 2025, Last Modified: 09 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Recent advances in deep learning have significantly boosted performance in multivariate time series forecasting (MTSF). While many existing approaches focus on capturing inter-variable (a.k.a. channel-wise) correlations to improve prediction accuracy, the temporal dimension, particularly its rich structural and contextual information, remains underexplored. In this paper, we propose BIM3, a novel framework that integrates BIdirectional temporal-aware modeling with Multi-Scale Mixture-of-Experts for MTSF. First, unlike existing methods that treat historical and future temporal information independently, we introduce a novel Timestamp Dual Cross-Attention Module, which employs a symmetric cross-attention mechanism to explicitly capture bidirectional temporal dependencies through timestamp interactions. Second, to address the complex and scale-varying temporal patterns commonly found in multivariate time series, we move beyond recent multi-scale forecasting models that share parameters across all channels and fail to capture channel-specific dynamics. Instead, we design a Multi-Scale Feature Extract Mixture-of-Experts module that adaptively routes time series to specialized experts based on their temporal characteristics. Extensive experiments on multiple real-world datasets show that BIM3 consistently outperforms state-of-the-art methods, highlighting its effectiveness in capturing both temporal structure and inter-variable diversity.
Loading