Direct Preference Optimization for Dynamical System Modeling

Direct Preference Optimization for Dynamical System Modeling

ICLR 2026 Conference Submission16815 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI for Physics

Abstract: Accurately predicting complex dynamic systems is crucial for scientific research and engineering practice, which is widely used in weather forecasting and fluid dynamics, \textit{etc}. However, \textit{the off-the-shelf} deep methods that rely only on numerical metrics, which often fail to capture \textbf{rare} events and ignore \textbf{human needs} for physical consistency and interpretability. With this in mind, this paper proposes a human-machine collaborative dynamical system prediction framework~\method{} that combines numerical accuracy with human preference scores. First, we pre-train the base model by minimizing the expectation risk to achieve a reliable convergence landscape. Then, we ideally plug in a diverse sampling strategy for generating different candidate predictions and adopt human-trusted metrics to select high (low)-quality prediction pairs to train a preference model. Finally, we jointly optimize the fixed preference objective with the pre-trained prediction model to improve both numerical accuracy and human perceptible quality. We provide theoretical analysis shows that this process can be seen as a bi-level optimization or game problem under certain conditions and can converge to an equilibrium solution. Experimental results demonstrate that \method{} not only effectively reduces overall risks in various dynamic system scenarios, including numerical weather forecasting and fluid vortex simulation, but also significantly outperforms existing SOTA methods in visual consistency and capturing extreme events.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 16815

Loading