The Cold Posterior Effect Indicates Underfitting, and Cold Posteriors Represent a Fully Bayesian Method to Mitigate It

TMLR Paper2514 Authors

12 Apr 2024 (modified: 22 Jun 2024)Decision pending for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performance than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE. Furthermore, we show that these tempered posteriors with ($T < 1$) are indeed proper Bayesian posteriors with a different combination of likelihood and prior parameterized by $T$. This observation validates the adjustment of the temperature hyperparameter $T$ as a straightforward approach to mitigate underfitting in the Bayesian posterior. In essence, we show that by fine-tuning the temperature $T$ we implicitly utilize alternative Bayesian posteriors, albeit with less misspecified likelihood and prior distributions.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Vincent_Fortuin1
Submission Number: 2514
Loading