Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods

Diego Granziol; Timur Garipov; Dmitry Vetrov; Stefan Zohren; Stephen Roberts; Andrew Gordon Wilson

Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods

Diego Granziol, Timur Garipov, Dmitry Vetrov, Stefan Zohren, Stephen Roberts, Andrew Gordon Wilson

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Understanding the neural network Hessian eigenvalues under the data generating distribution.

Abstract: The geometric properties of loss surfaces, such as the local flatness of a solution, are associated with generalization in deep learning. The Hessian is often used to understand these geometric properties. We investigate the differences between the eigenvalues of the neural network Hessian evaluated over the empirical dataset, the Empirical Hessian, and the eigenvalues of the Hessian under the data generating distribution, which we term the True Hessian. Under mild assumptions, we use random matrix theory to show that the True Hessian has eigenvalues of smaller absolute value than the Empirical Hessian. We support these results for different SGD schedules on both a 110-Layer ResNet and VGG-16. To perform these experiments we propose a framework for spectral visualization, based on GPU accelerated stochastic Lanczos quadrature. This approach is an order of magnitude faster than state-of-the-art methods for spectral visualization, and can be generically used to investigate the spectral properties of matrices in deep learning.

Code: https://drive.google.com/file/d/1JxmWXjMjJ12SooCZ4rxWHJVWIqAkekGY/view

Keywords: Random Matrix theory, deep learning, deep learning theory, hessian eigenvalues, true risk

Original Pdf: pdf

7 Replies

Loading