Cracking the Hessian: Closed-Form Hessian Spectra for Fundamental Neural Networks

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hessian, Optimization, Closed-form, Deep Learning, Theory
TL;DR: We derive the first closed-form Hessian spectra for simple but fundamental neural networks, revealing new structural insights and proof techniques with implications for theory and practice.
Abstract: The Hessian and its spectrum hold significant theoretical and practical relevance for building optimizers, measuring generalization, compressing models, and more. Prior works have characterized the Hessian through its spectral density, rank, and the outlier–bulk structure of its spectrum, often relying on approximations. However, the precise behavior of Hessian eigenvalues and eigenvectors remains unclear, owing both to the absence of closed-form results for non-trivial neural networks and the computational expense of empirical estimation. In this work, we derive closed-form expressions for all Hessian eigenvalues and eigenvectors in two-layer linear and ReLU networks with scalar input, arbitrary hidden width, and where the loss is aggregated over any number of samples. We further provide closed-form eigenvalues for the core component of Transformer architectures --- a single self-attention layer with arbitrary sequence length. Our results reveal a previously undiscovered `paired' structure of outlier eigenvalues, a cell-wise decomposition of the Hessian spectrum with ReLU, and the sensitivity of the Hessian condition number to the query and key matrix norms, as well as the presence of attention sinks. We complement these findings with experiments beyond the assumed model setting, showing strong correlation between the largest eigenvalue and the spectral norm of weight matrices, and empirical evidence that the paired eigenvalue structure persists more generally. Overall, by establishing these closed forms for the first time, and introducing the corresponding proof technique, we advance our understanding of the Hessian and open new avenues for its use.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 23672
Loading