- TL;DR: Understanding the neural network Hessian eigenvalues under the data generating distribution.
- Abstract: The geometric properties of loss surfaces, such as the local flatness of a solution, are associated with generalization in deep learning. The Hessian is often used to understand these geometric properties. We investigate the differences between the eigenvalues of the neural network Hessian evaluated over the empirical dataset, the Empirical Hessian, and the eigenvalues of the Hessian under the data generating distribution, which we term the True Hessian. Under mild assumptions, we use random matrix theory to show that the True Hessian has eigenvalues of smaller absolute value than the Empirical Hessian. We support these results for different SGD schedules on both a 110-Layer ResNet and VGG-16. To perform these experiments we propose a framework for spectral visualization, based on GPU accelerated stochastic Lanczos quadrature. This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.
- Code: https://drive.google.com/file/d/1JxmWXjMjJ12SooCZ4rxWHJVWIqAkekGY/view
- Keywords: Random Matrix theory, deep learning, deep learning theory, hessian eigenvalues, true risk
- Original Pdf: pdf