Iterative Inference in a Chess-Playing Neural Network

Published: 30 Sept 2025, Last Modified: 09 Nov 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Open Source Links: https://github.com/hartigel/leela-logit-lens, https://figshare.com/s/5342980a9ba8b26985a9
Keywords: Understanding high-level properties of models, Probing
Other Keywords: logit lens, chess, iterative inference, algorithmic vs. heuristic reasoning, layer-wise concept preferences
TL;DR: We extend the logit lens to analyze Leela Chess Zero's policy network, finding phased capability progression with substantial move reordering where correct puzzle solutions discovered in middle layers are sometimes overridden by safer alternatives.
Abstract: Do neural networks build their representations through smooth, gradual refinement, or via more complex computational processes? We investigate this by extending the logit lens to analyze the policy network of Leela Chess Zero, a superhuman chess engine. Although playing strength and puzzle-solving ability improve consistently across layers, capability progression occurs in distinct computational phases with move preferences undergoing continuous reevaluation—move rankings remain poorly correlated with final outputs until late, and correct puzzle solutions found in middle layers are sometimes overridden. This late-layer reversal is accompanied by concept preference analyses showing final layers prioritize safety over aggression, suggesting a mechanism by which heuristic priors can override tactical solutions.
Submission Number: 287
Loading