Enhancing spatiotemporal prediction through the integration of Mamba state space models and Diffusion Transformers
Abstract: This paper presents an advanced architecture for spatiotemporal prediction MAD, integrating Mamba modules with Diffusion Transformers for efficient spatiotemporal modeling. The model consists of three phases: encoding, reconstruction, and prediction. Initially, the encoder transforms raw spatiotemporal data into compact latent embeddings. In the reconstruction phase, the Mamba module processes these embeddings through normalization and bidirectional state space models, generating reconstructed representations which are then decoded to restore the input data. The prediction phase utilizes the Diffusion Transformer to model spatiotemporal features, incorporating time embeddings and leveraging self-attention mechanisms to capture complex spatiotemporal dependencies. Finally, the model jointly trains the reconstruction and prediction paths to achieve high-precision spatiotemporal forecasts. Experimental results demonstrate the model’s superior performance across various spatiotemporal prediction tasks, validating its effectiveness and robustness. Our codes are available at https://github.com/Hanson1331/KBS-MAD.
Loading