Embryology of a Language Model

George Wang; Garrett Baker; Andrew Gordon; Daniel Murfet

Embryology of a Language Model

George Wang, Garrett Baker, Andrew Gordon, Daniel Murfet

19 Sept 2025 (modified: 25 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretability, Language Model, Singular Learning Theory

TL;DR: We study the development of a small language model over training using a new interpretability technique called susceptibilities

Abstract: Understanding how language models develop their internal computational structure is a central problem in the science of deep learning. While susceptibilities, drawn from statistical physics, offer a promising analytical tool, their full potential for visualizing network organization remains untapped. In this work, we introduce an embryological approach, applying UMAP to the susceptibility matrix to visualize the model's structural development over training. Our visualizations reveal the emergence of a clear "body plan," charting the formation of known features like the induction circuit and discovering previously unknown structures, such as a "spacing fin" dedicated to counting space tokens. This work demonstrates that susceptibility analysis can move beyond validation to uncover novel mechanisms, providing a powerful, holistic lens for studying the developmental principles of complex neural networks.

Primary Area: interpretability and explainable AI

Submission Number: 20212

Loading