Rethinking Test Generalisation In Neural Algorithmic Reasoning

Manasvi Aggarwal; Dulhan Jayalath; Jonas Jürß

Rethinking Test Generalisation In Neural Algorithmic Reasoning

Manasvi Aggarwal, Dulhan Jayalath, Jonas Jürß

Published: 23 Oct 2025, Last Modified: 04 Nov 2025LOG 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural Algorithmic Reasoning, Graph Neural Networks

TL;DR: We reveal distribution bias in NAR benchmarks, propose the GOOD score to more accurately measure true out-of-distribution generalisation, and find asymptotic failures of NAR against simple GNNs.

Abstract: We discover distribution bias in the evaluation of neural algorithmic reasoning (NAR) that misrepresents out-of-distribution (OOD) generalisation of neural networks. With a Triplet-GMPNN baseline, we find that the GNN outperforms NAR in 46% of algorithmic reasoning tasks in the CLRS benchmark and is within 1 standard deviation for 67%. We show that this result is biased by test sets with specific problem instance sizes rather than a distribution of problem sizes. To address this, we introduce the Generalisation Out-of-Distribution (GOOD) score, a simple way to measure NAR generalisation using the area under a test score vs problem distribution curve. Through analysis with GOODscore and empirical curves, we identify that NAR generalisation is better than reported but is still often outperformed by simple GNN baselines asymptotically, highlighting new opportunities to improve NAR.

Submission Type: Extended abstract (max 4 main pages).

Submission Number: 92

Loading