Keywords: graph neural networks, neural algorithmic reasoning, neural compilation, mechanistic interpretability, algorithm learning, reasoning, expressivity, neurosymbolic
TL;DR: We introduce neural compilation for GATv2 and propose two measures of mechanistic faithfulness, which we validate experimentally, showing that even algorithmically aligned and parallel models struggle to learn faithfully.
Abstract: Neural networks can achieve high prediction accuracy on algorithmic reasoning tasks, yet even effective models fail to faithfully replicate ground-truth mechanisms, despite the fact that the training data contains adequate information to learn the underlying algorithms faithfully.
We refer to this as the \textit{mechanistic gap}, which we analyze by introducing neural compilation for GNNs, which is a novel technique that analytically encodes source algorithms into network parameters, enabling exact computation and direct comparison with conventionally trained models.
Specifically, we analyze graph attention networks (GATv2), because of their high performance on algorithmic reasoning, mathematical similarity to the transformer architecture, and established use in augmenting transformers for NAR.
Our analysis selects algorithms from the CLRS algorithmic reasoning benchmark: BFS, DFS, and Bellman-Ford, which span effective and algorithmically aligned algorithms.
We quantify faithfulness in two ways: external trace predictions, and internal attention mechanism similarity.
We demonstrate that there are mechanistic gaps even for algorithmically-aligned parallel algorithms like BFS, which achieve near-perfect accuracy but deviate internally from compiled versions.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 22518
Loading