Loss Landscape Degeneracy and Stagewise Development in Transformers

Published: 11 Aug 2025, Last Modified: 11 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep learning involves navigating a high-dimensional loss landscape over the neural network parameter space. Over the course of training, complex computational structures form and re-form inside the neural network, leading to shifts in input/output behavior. It is a priority for the science of deep learning to uncover principles governing the development of neural network structure and behavior. Drawing on the framework of singular learning theory, we propose that model development is deeply linked to degeneracy in the local geometry of the loss landscape. We investigate this link by monitoring loss landscape degeneracy throughout training, as quantified by the local learning coefficient, for a transformer language model and an in-context linear regression transformer. We show that training can be divided into distinct periods of change in loss landscape degeneracy, and that these changes in degeneracy coincide with significant changes in the internal computational structure and the input/output behavior of the transformers. This finding provides suggestive evidence that degeneracy and development are linked in transformers, underscoring the potential of a degeneracy-based perspective for understanding modern deep learning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Fix incorrect hyperparameter values in tables 1 & 2, & fix error in A.4 caption. --- Preceding changes in response to reviews: - Change to the title (replace "drives" by "and") - Change to abstract final sentence - Emphasis on the hypothetical nature of the link between degeneracy and internal structure in the introduction - Discussion of the LLC shortened in Section 4 and some material moved to appendices - Added a reference to Odonnat et al - Added a paragraph to Appendix A.2 discussing space and time complexity - Clarified the ICL score in Appendix C.1.3
Video: https://www.youtube.com/watch?v=02ovDX4JlTA
Code: https://github.com/timaeus-research/icl
Assigned Action Editor: ~Erin_Grant1
Submission Number: 4635
Loading