Calibration tests in multi-class classification: A unifying framework

David Widmann; Fredrik Lindsten; Dave Zachariah

Calibration tests in multi-class classification: A unifying framework

David Widmann, Fredrik Lindsten, Dave Zachariah

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the maximal predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. For a specific class of measures based on matrix-valued kernels different consistent and unbiased estimators are suggested and evaluated empirically. Importantly, these estimators can be interpreted as test statistics associated with well-defined probabilities of false rejection, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.

Code Link: https://github.com/devmotion/CalibrationPaper

CMT Num: 6628

0 Replies

Loading