Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Vignesh Kothapalli; Tianyu Pang; Shenyang Deng; Zongmin Liu; Yaoqing Yang

Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Vignesh Kothapalli, Tianyu Pang, Shenyang Deng, Zongmin Liu, Yaoqing Yang

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Heavy Tails, Spectral Analysis, Generalization

TL;DR: We analyze how heavy tails emerge in the weight spectrum of shallow neural networks and their relationship with generalization.

Abstract: Training strategies for modern deep neural networks (NNs) tend to induce a heavy- tailed (HT) empirical spectral density (ESD) in the layer weights. While previous efforts have shown that the HT phenomenon correlates with good generalization in large NNs, a theoretical explanation of its occurrence is still lacking. Especially, understanding the conditions which lead to this phenomenon can shed light on the interplay between generalization and weight spectra. Our work aims to bridge this gap by presenting a simple, rich setting to model the emergence of HT ESD. In particular, we present a theory-informed analysis for 'crafting' heavy tails in the ESD of two-layer NNs without any gradient noise. This is the first work to analyze a noise-free setting and incorporate optimizer (GD/Adam) dependent (large) learning rates into the HT ESD analysis. Our results highlight the role of learning rates on the Bulk+Spike and HT shape of the ESDs in the early phase of training, which can facilitate generalization in the two-layer NN. These observations shed light on the behavior of large-scale NNs, albeit in a much simpler setting. Last but not least, we present a novel perspective on the ESD evolution dynamics by analyzing the singular vectors of weight matrices and optimizer updates

Supplementary Material: zip

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8643

Loading