Bayesian Neural Network Priors Revisited

Vincent Fortuin; Adrià Garriga-Alonso; Florian Wenzel; Gunnar Ratsch; Richard E Turner; Mark van der Wilk; Laurence Aitchison

Bayesian Neural Network Priors Revisited

Vincent Fortuin, Adrià Garriga-Alonso, Florian Wenzel, Gunnar Ratsch, Richard E Turner, Mark van der Wilk, Laurence Aitchison

Published: 09 Dec 2020, Last Modified: 26 May 2025ICBINB 2020 SpotlightReaders: Everyone

Keywords: Bayesian neural networks, priors, Gaussian

TL;DR: Contrary to common practice, isotropic Gaussian distributions are not generally the best choice of priors for Bayesian neural networks.

Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, such simplistic priors are unlikely to either accurately reflect our true beliefs about the weight distributions, or to give optimal performance. We study summary statistics of (convolutional) neural network weights in networks trained using SGD. We find that in certain circumstances, these networks have heavy-tailed weight distributions, while convolutional neural network weights often display strong spatial correlations. Building these observations into the respective priors, we get improved performance on MNIST classification. Remarkably, we find that using a more accurate prior partially mitigates the cold posterior effect, by improving performance at high temperatures corresponding to exact Bayesian inference, while having less of an effect at small temperatures.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/bayesian-neural-network-priors-revisited/code)

1 Reply

Loading