Keywords: Heavy Tails, Spectral Analysis, Generalization
TL;DR: We analyze how heavy tails emerge in the weight spectrum of shallow neural networks and their relationship with generalization.
Abstract: Training strategies for modern deep neural networks (NNs) tend to induce a heavy-
tailed (HT) empirical spectral density (ESD) in the layer weights. While previous
efforts have shown that the HT phenomenon correlates with good generalization
in large NNs, a theoretical explanation of its occurrence is still lacking. Especially,
understanding the conditions which lead to this phenomenon can shed light on the
interplay between generalization and weight spectra. Our work aims to bridge this
gap by presenting a simple, rich setting to model the emergence of HT ESD. In
particular, we present a theory-informed analysis for 'crafting' heavy tails in the
ESD of two-layer NNs without any gradient noise. This is the first work to analyze a noise-free setting and incorporate optimizer (GD/Adam) dependent (large)
learning rates into the HT ESD analysis. Our results highlight the role of learning
rates on the Bulk+Spike and HT shape of the ESDs in the early phase of training,
which can facilitate generalization in the two-layer NN. These observations shed
light on the behavior of large-scale NNs, albeit in a much simpler setting. Last
but not least, we present a novel perspective on the ESD evolution dynamics by
analyzing the singular vectors of weight matrices and optimizer updates
Supplementary Material: zip
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8643
Loading