Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: World Model, Spatio-temporal data mining
Abstract: Physical spatiotemporal forecasting poses a dual challenge: The inherent stochasticity of physical systems makes it difficult to capture extreme or rare events, especially under \textit{data scarcity}. Moreover, many critical domain-specific metrics are \textit{non-differentiable}, precluding their direct optimization by conventional deep learning models. To address these challenges, we introduce a new paradigm, \textbf{\textit{Spatiotemporal Forecasting as Planning}}, and propose \textbf{\method{}}, a framework grounded in Model-Based Reinforcement Learning. First, \method{} constructs a novel Generative World Model to learn and simulate the physical dynamics system. This world model comprises a deterministic base network and a probabilistic Multi-scale Top-K Vector Quantized decoder. It not only provides a single-point prediction of the future but also generates a distribution of diverse, high-fidelity future states, enabling "imagination-based" simulation of the environment's evolution. Building on this foundation, the base forecasting model acts as an \textbf{\textit{Agent}}, whose output is treated as an action to guide exploration. We then introduce a \textbf{\textit{Planning Algorithm based on Beam Search}}. This algorithm performs forward exploration within the learned world model, leveraging the non-differentiable domain metrics as a \textbf{\textit{Reward Signal}} to identify high-return future sequences. Finally, these high-reward candidates, identified through planning, serve as high-quality pseudo-labels to continuously optimize the agent's \textbf{\textit{Policy}} through an iterative self-training process. The \method{} framework seamlessly integrates world model learning with reward-based planning, fundamentally addressing the challenge of optimizing non-differentiable objectives and mitigating data scarcity via exploration in its internal simulations. Comprehensive experiments on multiple benchmarks show that \method{} not only significantly reduces prediction error (e.g., up to 39\% MSE reduction) but also demonstrates exceptional performance on critical domain metrics, including physical consistency and the ability to capture extreme events.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 8506
Loading