Keywords: Selective Classification, distribution shift, deep learning, uncertainty estimation, failure prediction, misclassification detection, reject option, neural networks
TL;DR: Post-hoc methods to enhance selective classification performance within the in-distribution context also translate into similar improvements when facing distribution shifts.
Abstract: This paper addresses the problem of selective classification for deep neural networks, where a model is allowed to abstain from low-confidence predictions to avoid potential errors. Specifically, we investigate whether the selective classification performance of ImageNet classifiers is robust to distribution shift. Motivated by the intriguing observation in recent work that many classifiers appear to have a ``broken'' confidence estimator, we start by evaluating methods to fix this issue. We focus on so-called post-hoc methods, which replace the confidence estimator of a given classifier without retraining or modifying it, thus being practically appealing.
We perform an extensive experimental study of many existing and proposed confidence estimators applied to 84 pre-trained ImageNet classifiers available from popular repositories. Our results show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance, completely fixing the pathological behavior observed in many classifiers. As a consequence, the selective classification performance of any classifier becomes almost entirely determined by its corresponding accuracy. Then, we show these results are consistent under distribution shift: a method that enhances performance in the in-distribution scenario also provides similar gains under distribution shift. Moreover, although a slight degradation in selective classification performance is observed under distribution shift, this can be explained by the drop in accuracy of the classifier, together with the slight dependence of selective classification performance on accuracy.
Submission Number: 66
Loading