Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Sep 28, 2020 (edited Jun 16, 2021)ICLR 2021 Conference Blind SubmissionReaders: Everyone
• Reviewed Version (pdf): https://openreview.net/references/pdf?id=7ziPOrWPBk
• Keywords: Hessian, neural network, Kronecker factorization, PAC-Bayes bound, eigenspace, eigenvalue
• Abstract: Hessian captures important properties of the deep neural network loss landscape. We observe that eigenvectors and eigenspaces of the layer-wise Hessian for neural network objective have several interesting structures -- top eigenspaces for different models have high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the weight matrix of the corresponding layer. These structures, as well as the low rank structure of the Hessian observed in previous studies, can be explained by approximating the Hessian using Kronecker factorization. Our new understanding can also explain why some of these structures become weaker when the network is trained with batch normalization. Finally, we show that the Kronecker factorization can be combined with PAC-Bayes techniques to get better generalization bounds.
• One-sentence Summary: We investigate several interesting structures of layer-wise Hessian by approximating the Hessian using Kronecker factorization, and provide a nonvacuous PAC-Bayes generalization bound using the approximated Hessian eigenbasis.
• Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
• Supplementary Material: zip
13 Replies