Reinforcing Spatio-Temporal Graph Neural Networks with a Physics Reward Oracle

Hansheng Zeng; Yuqi Li; Chuanguang Yang; Weilun Feng; Zeyu Dong; Yao Lu; Yingli Tian; Hao Wu

Reinforcing Spatio-Temporal Graph Neural Networks with a Physics Reward Oracle

Hansheng Zeng, Yuqi Li, Chuanguang Yang, Weilun Feng, Zeyu Dong, Yao Lu, Yingli Tian, Hao Wu

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: spatio-temporal; out-of-distribution

Abstract: Modeling and predicting spatio-temporal dynamical systems are pivotal in numerous scientific and engineering domains, yet their inherent complexity and the stringent requirement for out-of-distribution (OOD) generalization pose significant challenges. While existing models based on Graph Neural Networks (GNNs) excel at capturing spatio-temporal dependencies, they often exhibit insufficient robustness and inaccurate uncertainty estimation when confronted with unseen or perturbed dynamical patterns. To address this challenge, this paper proposes a novel training framework inspired by Direct Preference Optimization (DPO) to enhance the OOD generalization capabilities of Multi-scale Spatio-Temporal Graph (MSTG) models for dynamical systems. At the core of our approach is the construction of an automated "physics preference oracle" that leverages uncertainty estimation and system perturbations to generate paired trajectory preference data. Specifically, the model generates multiple candidate future trajectories by applying perturbations to the input or leveraging its inherent stochasticity. The oracle then automatically evaluates these trajectories and identifies "preferred" and "dispreferred" outcomes based on metrics such as physical consistency, robustness against perturbations, and the reliability of the predicted uncertainty. Using this preference dataset, we introduce a DPO-style loss function to directly optimize the MSTG model, encouraging it to favor predictions that are more consistent with physical laws, more resilient to perturbations, and provide reliable uncertainty estimates (i.e., high uncertainty) in OOD scenarios. This method aims to elevate dynamical system modeling from mere data fitting to learning and internalizing the intrinsic physical properties and robust behaviors of the system. Experiments demonstrate that our proposed framework significantly improves the OOD generalization, prediction accuracy, and quality of uncertainty quantification for MSTG models in complex spatio-temporal modeling and prediction tasks. This research offers a new perspective on leveraging DPO-like reinforcement learning paradigms to tackle fundamental challenges in scientific computing.

Primary Area: learning on time series and dynamical systems

Submission Number: 23578

Loading