Enhancing spatiotemporal prediction through the integration of Mamba state space models and Diffusion Transformers

Hansheng Zeng, Yuqi Li, Ruize Niu, Chuanguang Yang, Shiping Wen

Published: 01 Jan 2025, Last Modified: 14 May 2025Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents an advanced architecture for spatiotemporal prediction MAD, integrating Mamba modules with Diffusion Transformers for efficient spatiotemporal modeling. The model consists of three phases: encoding, reconstruction, and prediction. Initially, the encoder transforms raw spatiotemporal data into compact latent embeddings. In the reconstruction phase, the Mamba module processes these embeddings through normalization and bidirectional state space models, generating reconstructed representations which are then decoded to restore the input data. The prediction phase utilizes the Diffusion Transformer to model spatiotemporal features, incorporating time embeddings and leveraging self-attention mechanisms to capture complex spatiotemporal dependencies. Finally, the model jointly trains the reconstruction and prediction paths to achieve high-precision spatiotemporal forecasts. Experimental results demonstrate the model’s superior performance across various spatiotemporal prediction tasks, validating its effectiveness and robustness. Our codes are available at https://github.com/Hanson1331/KBS-MAD.