Efficient calibration as a binary top-versus-all problem for classifiers with many classes

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Calibration, Image Classification, Deep Learning, Neural Networks
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a reformulation of confidence calibration as a binary problem to significantly improve the performance of existing calibration methods for classifiers with many classes.
Abstract: Most classifiers based on deep neural networks associate their class prediction with a probability known as the confidence score. This score is often a by-product of the learning step and may not correctly estimate the classification accuracy, which impacts real-world usage. To be reliably used, the confidence score requires a post-processing calibration step. Data-driven methods have been proposed to calibrate the confidence score of already-trained classifiers. Still, many methods fail when the number of classes is high and per-class calibration data is scarce. To deal with a large number of classes, we propose to reformulate the confidence calibration of multiclass classifiers as a single binary classification problem. Our top-versus-all reformulation allows the use of the binary cross-entropy loss for scaling calibration methods. Contrary to the standard one-versus-all reformulation, it also allows the application of binary calibration methods to multiclass classifiers with efficient use of scarce per-class calibration data and without degradation of the accuracy. Additionally, we solve the problem of scaling methods overfitting the calibration set by introducing a regularization loss term during optimization. We evaluate our approach on an extensive list of deep networks and standard image classification datasets (CIFAR-10, CIFAR-100, and ImageNet). We show that it significantly improves the performance of existing calibration methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5749
Loading