Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks

Maya Bechler-Speicher; Ben Finkelshtein; Fabrizio Frasca; Luis Müller; Jan Tönshoff; Antoine Siraudin; Viktor Zaverkin; Michael M. Bronstein; Mathias Niepert; Bryan Perozzi; Mikhail Galkin; Christopher Morris

Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks

Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca, Luis Müller, Jan Tönshoff, Antoine Siraudin, Viktor Zaverkin, Michael M. Bronstein, Mathias Niepert, Bryan Perozzi, Mikhail Galkin, Christopher Morris

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We argue that graph learning needs a significant revision of benchmarks and benchmarking culture to stay relevant.

Abstract: While machine learning on graphs has demonstrated promise in drug design and molecular property prediction, significant benchmarking challenges hinder its further progress and relevance. Current benchmarking practices often lack focus on transformative, real-world applications, favoring narrow domains like two-dimensional molecular graphs over broader, impactful areas such as combinatorial optimization, databases, or chip design. Additionally, many benchmark datasets poorly represent the underlying data, leading to inadequate abstractions and misaligned use cases. Fragmented evaluations and an excessive focus on accuracy further exacerbate these issues, incentivizing overfitting rather than fostering generalizable insights. These limitations have prevented the development of truly useful graph foundation models. This position paper calls for a paradigm shift toward more meaningful benchmarks, rigorous evaluation protocols, and stronger collaboration with domain experts to drive impactful and reliable advances in graph learning research, unlocking the potential of graph learning.

Lay Summary: Graph learning is a field of machine learning concerned with processing objects describing relationships between entities, mathematically known as graphs. For example, a "molecular graph" represents a molecule by describing the relationships between its atoms, i.e., their chemical bonds. An example of a graph learning task is to predict chemically relevant properties for molecular graphs. The effectiveness of machine learning methods is typically measured on benchmarks. These are sets of controlled experiments on predefined data which allow to track relevant research progress. An ideal benchmark originates from, and is closely connected to, a real-world problem: strong benchmark results should indicate research methods can be successfully applied to real-world use cases. In this position paper we claim, however, that the progress of graph learning research is currently hindered by poor benchmarks, along with an unhealthy benchmarking culture. We argue that the most popular benchmarks are overly focussed on prediction tasks over molecular graphs, with benchmarks for other promising applications lacking. Additionally, we identify benchmarks which poorly model the relationships in the underlying data or miss to connect to relevant real-world use-cases. We also underscore how newly proposed methods are often compared against baseline approaches whose performances are not measured at the best of their possibility. We support many of the above claims with experimental evidence and provide multiple examples of real-world applications for which new graph learning benchmarks would be valuable. These include the design of computer chips, weather forecasting, applications to relational databases and combinatorial optimization problems with impacts in, e.g., logistics.

Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)

No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.

Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.

Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.

Paper Verification Code: NmU4O

Link To Code: https://github.com/benfinkelshtein/PP-Benchmarks

Permissions Form: pdf

Primary Area: Research Priorities, Methodology, and Evaluation

Keywords: MPNNs, GNNs, graphs, benchmarking

Submission Number: 5

Loading