MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning

Published: 23 Sept 2025, Last Modified: 21 Oct 2025NPGML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: graph neural networks, algorithmic alignment, neural algorithmic reasoning, mechanistic interpretability, circuit discovery
TL;DR: We introduce a circuit discovery method to study neural algorithmic reasoning in GNNs
Abstract: Graph neural networks (GNNs) are known to be capable of implementing specific algorithmic steps that guarantee strong out-of-distribution performance, a property referred to as algorithmic alignment or neural algorithmic reasoning (NAR). At the same time, recent advances in the reasoning capabilities of large language models (LLMs) have created an interest in mechanistic interpretability: identifying specific model components that are responsible for certain tasks. In this work, we adapt circuit discovery methods from mechanistic interpretability to the GNN setting with Mechanistic Interpretability for Neural Algorithmic Reasoning (MINAR). We validate MINAR by applying it to two GNNs: one predicting single-source shortest path distances and another computing shortest path distances and reachability in parallel. Through both examples, we demonstrate how mechanistic interpretability can offer fine-grained insight into an algorithmically aligned model.
Submission Number: 57
Loading