The Cold Posterior Effect Indicates Underfitting, and Cold Posteriors Represent a Fully Bayesian Method to Mitigate It

TMLR Paper2514 Authors

12 Apr 2024 (modified: 18 Apr 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performance than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood. In this work, we provide a more nuanced understanding of the CPE as we show that \emph{misspecification leads to CPE only when the resulting Bayesian posterior underfits}. In fact, we theoretically show that if there is no underfitting, there is no CPE. Furthermore, we show that these \emph{tempered posteriors} with ($T < 1$) are indeed proper Bayesian posteriors with a different combination of likelihood and prior parameterized by $T$. Within the \textit{empirical Bayes} framework, this observation validates the adjustment of the temperature hyperparameter $T$ as a straightforward approach to mitigate underfitting in the Bayesian posterior. In essence, we show that by fine-tuning the temperature $T$ we implicitly utilize alternative Bayesian posteriors, albeit with less misspecified likelihood and prior distributions.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Vincent_Fortuin1
Submission Number: 2514
Loading