Abstract: In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the maximal predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. For a specific class of measures based on matrix-valued kernels different consistent and unbiased estimators are suggested and evaluated empirically. Importantly, these estimators can be interpreted as test statistics associated with well-defined probabilities of false rejection, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.
Code Link: https://github.com/devmotion/CalibrationPaper
CMT Num: 6628
0 Replies
Loading