Embryology of a Language Model

TMLR Paper7188 Authors

27 Jan 2026 (modified: 18 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Understanding how language models develop their internal computational structure is a central problem in the science of deep learning. We study this development through an embryological lens, applying UMAP to susceptibility vectors to visualize structural organization over training. We observe the emergence of a striking ``body plan''---the rainbow serpent---with an anterior-posterior axis defined by global expression versus suppression, dorsal-ventral stratification corresponding to the induction circuit, and a novel ``spacing fin'' structure. This body plan is reproducible across random seeds, suggesting that high-level functional organization is determined by architecture and data rather than initialization. Our work demonstrates that the relationship between data and internal structure is legible and developmental, with implications for both understanding and guiding model development.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Alexander_S_Ecker1
Submission Number: 7188
Loading