Approaching Deep Learning through the Spectral Dynamics of Weights

Approaching Deep Learning through the Spectral Dynamics of Weights

TMLR Paper4515 Authors

18 Mar 2025 (modified: 27 May 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study the spectral dynamics of weights--the behavior of singular values and vectors during optimization--showing that they clarify and link many phenomena in deep learning. Through extensive experiments, covering small-scale ``grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers, we identify a consistent bias with three key ingredients. First, singular values evolve unequally leading to rank minimization. As a result, top singular vectors stabilize well before the end of training, and lastly this happens without displaying alignment between neighboring layers used in several theoretical results. We show how this bias tracks the transition to generalization in grokking. We demonstrate more generally that weight decay enhances rank minimization beyond its role as a norm regularizer in practical systems. Moreover, we show that these spectral dynamics distinguish random label training from true labels, offering a novel perspective on this longstanding conundrum. Additionally, these dynamics reveal structure in well-performing sparse subnetworks (lottery tickets) and the shape of the loss surface through linear mode connectivity. Our findings suggest that spectral dynamics provide a coherent view that links the behavior of neural networks across diverse settings.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Jeffrey_Pennington1

Submission Number: 4515

Loading