everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Most classifiers based on deep neural networks associate their class prediction with a probability known as the confidence score. This score is often a by-product of the learning step and may not correctly estimate the classification accuracy, which impacts real-world usage. To be reliably used, the confidence score requires a post-processing calibration step. Data-driven methods have been proposed to calibrate the confidence score of already-trained classifiers. Still, many methods fail when the number of classes is high and per-class calibration data is scarce. To deal with a large number of classes, we propose to reformulate the confidence calibration of multiclass classifiers as a single binary classification problem. Our top-versus-all reformulation allows the use of the binary cross-entropy loss for scaling calibration methods. Contrary to the standard one-versus-all reformulation, it also allows the application of binary calibration methods to multiclass classifiers with efficient use of scarce per-class calibration data and without degradation of the accuracy. Additionally, we solve the problem of scaling methods overfitting the calibration set by introducing a regularization loss term during optimization. We evaluate our approach on an extensive list of deep networks and standard image classification datasets (CIFAR-10, CIFAR-100, and ImageNet). We show that it significantly improves the performance of existing calibration methods.