Understanding How Chess-Playing Language Models Compute Linear Board Representations

Aaron Mei

Understanding How Chess-Playing Language Models Compute Linear Board Representations

Aaron Mei

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanistic Interpretability, Language Models, Chess, World Models

TL;DR: This work investigates how a GPT-2 style transformer trained on chess computes linear board representations.

Abstract: The field of mechanistic interpretability seeks to understand the internal workings of neural networks, particularly language models. While previous research has demonstrated that language models trained on games can develop linear board representations, the mechanisms by which these representations arise are unknown. This work investigates the internal workings of a GPT-2 style transformer trained on chess PGNs, and proposes an algorithm for how the model computes the board state.

Code: zip

Submission Number: 49

Loading