Multi-modal Spatiotemporal Forecasting via Cross-Scale Operator Learning and Spatial Representation Aggregation

Yajun Gao, Tianrui Ma, Chujie Xu, Miao Wang

Published: 01 Jan 2024, Last Modified: 26 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: The vertically integrated liquid (VIL) is one of the effective indicators for assessing severe weather caused by strong convection, and accurate forecasting is crucial. In highly complex and interconnected atmospheric systems, the evolution of VIL is often potentially linked to other physical factors such as cloud cover, water vapor, and temperature. However, most previous studies have used various time series prediction methods that only relied on the statistical patterns of observed VIL variables, neglecting the correlations between VIL and other physical backgrounds. This limitation makes it difficult to achieve reliable and accurate predictions that align with the overall atmospheric dynamics. Therefore, we fully utilize the physical spatiotemporal data obtained from satellites and propose a novel multi-modal spatiotemporal forecasting model. This model includes a Cross-scale Operator Learner (COL) and a Spatial Representation Aggregator (SRA), which extracts and integrates physical background information to align model predictions with real physical processes. The COL could discover a common nonlinear operator from correlated multi-modal data. It explores various nonlinear functions that map the discrete input space to hidden physical spatial space, thereby obtaining an atmospheric spatial representation of atmospheric evolution processes. The SRA integrates the atmospheric spatial representation and VIL’s spatial representation, generating an enhanced spatiotemporal representation with both VIL spatial details and atmospheric physical background. This representation can realize temporal embedding better and improve the model’s accuracy in predicting VIL. We refined the SEVIR dataset to ensure complete and aligned modalities, and conducted extensive experiments demonstrating that our model outperforms other state-of-the-art models.

External IDs:doi:10.1007/978-981-97-6125-8_9