ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation

Sungduk Yu, Zeyuan Hu, Akshay Subramaniam, Walter M. Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus C. Will, Gunnar Behrens, Julius J. M. Busecke, Nora Loose, Charles Stern, Tom Beucler, Bryce E. Harrop, Helge Heuer, Benjamin R. Hillman, Andrea M. Jenney, Nana Liu et al. (27 additional authors not shown)

Published: 2025, Last Modified: 07 May 2026J. Mach. Learn. Res. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid physics-ML simulations require domain-specific data and workflows that have been inaccessible to many ML experts. This paper is an extended version of our NeurIPS award-winning ClimSim dataset paper. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors spanning ten years at high temporal resolution, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. In this extended version, we introduce a significant new contribution in Section 5, which provides a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various baselines of ML models and hybrid simulators to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, also in a low-resolution version at https://huggingface.co/datasets/LEAP/ClimSim_low-res and an aquaplanet version at https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid physics-ML and high-fidelity climate simulations.
Loading