STNet: Spatial-Temporal Transformers are Effective for Multivariate Time Series Forecasting

Kexin Yang, Xing Wang, Zhendong Wang, Ce Chi, Chao Deng, Junlan Feng

Published: 2025, Last Modified: 23 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The recent boom of Transformer-based models have enhanced state-of-the-art results of multivariate time series (MTS) forecasting. However, MTS forecasting remains a challenging problem, primarily because of the intricate temporal patterns and obscured spatial correlations. Existing models are not only computationally expensive in modeling long-term temporal dependencies, but more importantly, fail to adequately capture the interrelations among variables. To address these problems, we propose STNet, a Spatial-Temporal Transformer Network with self-supervised pre-training scheme for MTS forecasting. In STNet, the MTS are formalized as a data-driven graph structure, which is learned through the training process to extract the latent patterns within the spatial dependencies of the data. Then we encode the structural information of the graph into the spatial Transformer encoder to help STNet better model refined spatial dependencies. A patch-level Transformer encoder is implemented to efficiently enhance locality and comprehensively capture semantic information pertaining to temporal dependencies. Moreover, the pre-training model generates rich contextual information, which consistently leads to reliable outcomes in transfer performance for downstream tasks. Extensive experiments conducted on five real-world benchmark datasets show the proposed STNet improves the prediction accuracy by 5.0%-25.2% compared to previous state-of-the-arts.

External IDs:dblp:conf/ijcnn/YangWWCDF25