Embryology of a Language Model

ICLR 2026 Conference Submission20212 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, Language Model, Singular Learning Theory
TL;DR: We study the development of a small language model over training using a new interpretability technique called susceptibilities
Abstract: Understanding how language models develop their internal computational structure is a central problem in the science of deep learning. While susceptibilities, drawn from statistical physics, offer a promising analytical tool, their full potential for visualizing network organization remains untapped. In this work, we introduce an embryological approach, applying UMAP to the susceptibility matrix to visualize the model's structural development over training. Our visualizations reveal the emergence of a clear "body plan," charting the formation of known features like the induction circuit and discovering previously unknown structures, such as a "spacing fin" dedicated to counting space tokens. This work demonstrates that susceptibility analysis can move beyond validation to uncover novel mechanisms, providing a powerful, holistic lens for studying the developmental principles of complex neural networks.
Primary Area: interpretability and explainable AI
Submission Number: 20212
Loading