Spectral Dynamics in Neural Network Training: Mathematical Foundations for Understanding Representational Development
Keywords: Developmental interpretability, Foundational work
TL;DR: We propose a matrix-valued SDE view of SGD that describes singular-value repulsion and gamma-like bulk+tail spectra, and observe in experiments qualitative agreement and a simple way to track leading singular values during training.
Abstract: Understanding the mathematical foundations underlying neural network training dynamics is essential for mechanistic interpretability research. We develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects SGD optimization to the evolution of spectral structure in weight matrices. We derive exact SDEs showing that singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails that explain the empirically observed `bulk+tail' spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a mathematical framework for mechanistic interpretability researchers to predict when interpretable structure emerges during training and monitor the development of internal representations.
Submission Number: 265
Loading