On the Power-Law Hessian Spectra in Deep Learning

Zeke Xie; Qian-Yuan Tang; YUNFENG CAI; Mingming Sun

On the Power-Law Hessian Spectra in Deep Learning

Zeke Xie, Qian-Yuan Tang, YUNFENG CAI, Mingming Sun

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Deep Learning, Loss Landscape, Hessian

TL;DR: We are the first to demonstrate that the Hessian spectra of well-trained deep neural networks exhibit simple power-law structures and critically relate to multiple behaviors of deep learning. .

Abstract: It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the mathematical structure behind the Hessian spectra is still under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectra of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physics theories, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exists. Our extensive experiments using the novel power-law spectral method reveal that the power-law Hessian spectra critically relate to multiple important behaviors of deep learning, including optimization, generalization, overparameterization, and overfitting.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

5 Replies

Loading