Bayesian Neural Network Priors Revisited

Vincent Fortuin; Adrià Garriga-Alonso; Florian Wenzel; Gunnar Ratsch; Richard E Turner; Mark van der Wilk; Laurence Aitchison

Bayesian Neural Network Priors Revisited

Vincent Fortuin, Adrià Garriga-Alonso, Florian Wenzel, Gunnar Ratsch, Richard E Turner, Mark van der Wilk, Laurence Aitchison

Published: 21 Dec 2020, Last Modified: 20 Apr 2025AABI2020Readers: Everyone

Keywords: Bayesian neural networks, priors, Gaussian

TL;DR: We show that heavy-tailed and correlated priors can improve the performance of Bayesian neural networks and reduce the cold posterior effect.

Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, such simplistic priors are unlikely to either accurately reflect our true beliefs about the weight distributions, or to give optimal performance. We study summary statistics of (convolutional) neural network weights in networks trained using SGD. We find that in certain circumstances, these networks have heavy-tailed weight distributions, while convolutional neural network weights often display strong spatial correlations. Building these observations into the respective priors, we get improved performance on MNIST classification. Remarkably, we find that using a more accurate prior partially mitigates the cold posterior effect, by improving performance at high temperatures corresponding to exact Bayesian inference, while having less of an effect at small temperatures.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/bayesian-neural-network-priors-revisited/code)

1 Reply

Loading