The Cold Posterior Effect Indicates Underfitting, and Cold Posteriors Represent a Fully Bayesian Method to Mitigate It
Abstract: The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performance than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood. In this work, we provide a more nuanced understanding of CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE. Furthermore, we show that these tempered posteriors with $T < 1$ are indeed proper Bayesian posteriors with a different combination of likelihoods and priors parameterized by $T$. This observation validates the adjustment of the temperature hyperparameter $T$ as a straightforward approach to mitigate underfitting in the Bayesian posterior. In essence, we show that by fine-tuning the temperature $T$ we implicitly utilize alternative Bayesian posteriors, albeit with less misspecified likelihood and prior distributions. The code for replicating the experiments can be found at https://github.com/pyijiezhang/cpe-underfit.
Submission Length: Long submission (more than 12 pages of main content)
Supplementary Material: zip
Changes Since Last Submission: [EiC] upon author request: corrected author order and uploaded new pdf that includes Appendix.
Code: https://github.com/pyijiezhang/cpe-underfit
Assigned Action Editor: ~Vincent_Fortuin1
Submission Number: 2514
Loading