Keywords: PAC-Bayes, Hessian, curvature, lower bound, Variational Inference
TL;DR: For PAC-Bayes when modeling the prior and posterior as multivariate Gaussians, we show that for specific cases of DNNs it is impossible to prove generalization, assuming a second order Taylor expansion of the empirical loss is tight.
Abstract: We investigate whether it's possible to tighten PAC-Bayes bounds for deep neural networks by utilizing the Hessian of the training loss at the minimum. For the case of Gaussian priors and posteriors we introduce a Hessian-based method to obtain tighter PAC-Bayes bounds that relies on closed form solutions of layerwise subproblems. We thus avoid commonly used variational inference techniques which can be difficult to implement and time consuming for modern deep architectures. We conduct a theoretical analysis that links the random initialization, minimum, and curvature at the minimum of a deep neural network to limits on what is provable about generalization through PAC-Bayes. Through careful experiments we validate our theoretical predictions and analyze the influence of the prior mean, prior covariance, posterior mean and posterior covariance on obtaining tighter bounds.
Original Pdf: pdf
18 Replies
Loading