Can a Confident Prior Replace a Cold Posterior?

Martin Marek; Brooks Paige; Pavel Izmailov

Can a Confident Prior Replace a Cold Posterior?

Martin Marek, Brooks Paige, Pavel Izmailov

Published: 27 May 2024, Last Modified: 11 Jul 2024AABI 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bayesian neural networks, Bayesian deep learning, probabilistic deep learning, cold posteriors

Abstract: Benchmark datasets used for image classification tend to have very low levels of label noise. When Bayesian neural networks are trained on these datasets, they often underfit, misrepresenting the aleatoric uncertainty of the data. A common solution is to cool the posterior, which improves fit to the training data but is challenging to interpret from a Bayesian perspective. We introduce a clipped version of the Dirichlet prior to control the aleatoric uncertainty of a Bayesian neural network, nearly matching the performance of cold posteriors within the standard Bayesian framework. We explain why the Dirichlet prior needs to be clipped in order to converge, and we derive the conditions under which it is numerically stable.

Submission Number: 21

Loading