Keywords: Calibration, Calibration error, Classification
TL;DR: An extension of a variational calibration error estimator that was introduced for proper calibration errors to a large class of non-proper calibration errors, induced by $L_p$ divergences.
Abstract: Calibration—the problem of ensuring that predicted probabilities align with observed class frequencies—is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.
Submission Number: 19
Loading