Spectral Dynamics in Neural Network Training: Mathematical Foundations for Understanding Representational Development

Brian Richard Olsen; Sam Fatehmanesh; Frank Xiao; Adarsh Kumarappan; Anirudh Gajula

Spectral Dynamics in Neural Network Training: Mathematical Foundations for Understanding Representational Development

Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula

Published: 30 Sept 2025, Last Modified: 30 Sept 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Developmental interpretability, Foundational work

TL;DR: We propose a matrix-valued SDE view of SGD that describes singular-value repulsion and gamma-like bulk+tail spectra, and observe in experiments qualitative agreement and a simple way to track leading singular values during training.

Abstract: Understanding the mathematical foundations underlying neural network training dynamics is essential for mechanistic interpretability research. We develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects SGD optimization to the evolution of spectral structure in weight matrices. We derive exact SDEs showing that singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails that explain the empirically observed `bulk+tail' spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a mathematical framework for mechanistic interpretability researchers to predict when interpretable structure emerges during training and monitor the development of internal representations.

Submission Number: 265

Loading