S$^2$Transformer: Scalable Structured Transformers for Global Station Weather Forecasting

Hongyi Chen; Xiucheng Li; Xinyang Chen; Yun Cheng; Jing Li; Kehai Chen; Liqiang Nie

S$^2$Transformer: Scalable Structured Transformers for Global Station Weather Forecasting

Hongyi Chen, Xiucheng Li, Xinyang Chen, Yun Cheng, Jing Li, Kehai Chen, Liqiang Nie

Published: 19 Mar 2026, Last Modified: 19 Mar 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Global Station Weather Forecasting (GSWF) is a key meteorological research area, critical to energy, aviation, and agriculture. Existing time series forecasting methods often ignore or unidirectionally model spatial correlation when conducting large-scale global station forecasting. This contradicts the intrinsic nature underlying observations of the global weather system, limiting forecast performance. To address this, we propose a novel Spatial Structured Attention Block in this paper. It partitions the spatial graph into a set of subgraphs and instantiates Intra-subgraph Attention to learn local spatial correlation within each subgraph, and aggregates nodes into subgraph representations for message passing among the subgraphs via Inter-subgraph Attention---considering both spatial proximity and global correlation. Building on this block, we develop a multiscale spatiotemporal forecasting model S$^2$Transformer by progressively expanding subgraph scales. The resulting model is both scalable and able to produce structured spatial correlation, and meanwhile, it is easy to implement. The experimental results show that it can achieve performance improvements up to 16.8% over time series forecasting baselines at low running costs.

Submission Type: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We have revised the manuscript to address the comments from the Action Editor and Reviewers. The main changes are as follows: 1. We added a new ablation variant $w/o. \text{ MA}$ in Section 5.4 to evaluate the impact of the hierarchical multi-scale architecture. 2. We provided a rigorous mathematical definition of "structured spatial correlation" in Appendix C. 3. We included a performance comparison between $S^2\text{Transformer}$ and physics-informed models in Appendix E. 4. We added Appendix F to discuss the model's sensitivity to the number of subgraphs and the robustness of the graph partitioning strategy. 5. We rewrote the Broader Impact section to provide a more comprehensive discussion on the societal implications of our work.

Code: https://github.com/hongyichenhitsz/S2Transformer

Assigned Action Editor: ~Chuxu_Zhang2

Submission Number: 5895

Loading