Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms

Philippe Martin Wyder; Judah Goldfeder; Alexey Yermakov; Yue Zhao; Stefano Riva; Jan P. Williams; David Zoro; Amy Sara Rude; Matteo Tomasetto; Joe Germany; Joseph Bakarji; Georg Maierhofer; Miles Cranmer; J. Nathan Kutz

Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms

Philippe Martin Wyder, Judah Goldfeder, Alexey Yermakov, Yue Zhao, Stefano Riva, Jan P. Williams, David Zoro, Amy Sara Rude, Matteo Tomasetto, Joe Germany, Joseph Bakarji, Georg Maierhofer, Miles Cranmer, J. Nathan Kutz

Published: 09 Jun 2025, Last Modified: 14 Jul 2025CODEML@ICML25EveryoneRevisionsBibTeXCC BY 4.0

Keywords: machine learning, open source, common task framework, benchmark, scientific ML

TL;DR: We propose the open-source Common Task Framework for benchmarking scientific machine learning models.

Abstract: To address the problem of rapid development outpacing the creation of standardized, objective benchmarks, we propose a Common Task Framework (CTF) for evaluating scientific machine learning models on dynamical systems. The CTF features a curated set of datasets and task-specific metrics spanning state forecasting, state reconstruction, and generalization under realistic constraints, including noise and limited data. Inspired by the success of CTFs in other machine learning fields like natural language processing and computer vision, our framework provides a structured, rigorous foundation for head-to-head evaluation of diverse algorithms. Our open-source framework enables researchers to rapidly implement, test, and optimize their models against our datasets to support our long-term vision to raise the bar for rigor and reproducibility in scientific ML.

Submission Number: 38

Loading