On the Power-Law Hessian Spectra in Deep LearningDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Deep Learning, Loss Landscape, Hessian
TL;DR: We are the first to demonstrate that the Hessian spectra of well-trained deep neural networks exhibit simple power-law structures and critically relate to multiple behaviors of deep learning. .
Abstract: It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the mathematical structure behind the Hessian spectra is still under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectra of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physics theories, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exists. Our extensive experiments using the novel power-law spectral method reveal that the power-law Hessian spectra critically relate to multiple important behaviors of deep learning, including optimization, generalization, overparameterization, and overfitting.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
5 Replies

Loading