A comprehensive benchmark of graph neural networks, graph kernels, and classical machine learning approaches on rs-fMRI brain graphs

03 Dec 2025 (modified: 15 Dec 2025)MIDL 2026 Validation Papers SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Resting-state fMRI, brain networks, Graph kernels, Graph neural networks, benchmarking, reproducibility, computational efficiency.
TL;DR: This paper benchmarks graph kernels, classical machine learning algorithms, and graph neural networks on resting-state fMRI brain networks, with a focus on reproducibility and computational efficiency.
Abstract: Resting-state functional MRI (rs-fMRI) provides a powerful lens through which large-scale brain organization can be examined by modeling functional connectivity as a graph. These functional brain graphs now form the basis of machine-learning applications in neuroscience, ranging from relatively straightforward classification problems to more challenging behavioral and cognitive prediction tasks. While graph neural networks (GNNs) have gained increasing attention in neuroimaging, the absence of a unified, reproducible benchmark comparing GNNs with classical machine-learning models and graph kernel methods, across heterogeneous datasets and tasks, has made it difficult to assess their relative strengths. In this work, we introduce a comprehensive benchmarking framework spanning four heterogeneous cohorts (N = 1513) and multiple classification tasks, including clinical diagnosis and phenotypic prediction. We systematically evaluate classical models, graph kernels, and representative GNN architectures under a rigorous repeated nested cross-validation design and assess pairwise differences using the corrected repeated k-fold test with false-discovery-rate control. Our results show that, for this class of relatively small graphs with fixed vertex ordering, well-tuned classical ML approaches and graph kernels are competitive with GNNs, while requiring substantially fewer computational resources. For instance, the Shortest-Path graph kernel achieves 0.98 accuracy on the COMA dataset, logistic regression reaches 0.81 accuracy and 0.63 MCC on HCP sex prediction, and all model families cluster closely on multi-site datasets such as ABIDE and ADHD, where no statistically significant differences emerge. All code, seeds, cross-validation folds, fold-specific hyperparameters, full prediction logs and computational-cost measurements are publicly released at \url{https://gitlab.inria.fr/rmhanna/benchmark-study} to ensure full transparency and reproducibility. This benchmark provides practical guidance for model selection in rs-fMRI connectome analysis.
Primary Subject Area: Application: Neuroimaging
Secondary Subject Area: Safe and Trustworthy Learning-assisted Solutions for Medical Imaging
Registration Requirement: Yes
Reproducibility: https://gitlab.inria.fr/rmhanna/benchmark-study
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 32
Loading