How to measure deep uncertainty estimation performance and which models are naturally better at providing it

Ido Galil; Mohammed Dabbah; Ran El-Yaniv

How to measure deep uncertainty estimation performance and which models are naturally better at providing it

Ido Galil, Mohammed Dabbah, Ran El-Yaniv

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: non-Bayesian uncertainty estimation, selective prediction, transformer, vision transformer, vit, risk-coverage curve, selective classification, classification with a reject option

Abstract: When deployed for risk-sensitive tasks, deep neural networks (DNNs) must be equipped with an uncertainty estimation mechanism. This paper studies the relationship between deep architectures and their training regimes with their corresponding uncertainty estimation performance. We consider both in-distribution uncertainties ("aleatoric" or "epistemic") and class-out-of-distribution ones. Moreover, we consider some of the most popular estimation performance metrics previously proposed including AUROC, ECE, AURC, and coverage for selective accuracy constraint. We present a novel and comprehensive study carried out by evaluating the uncertainty performance of 484 deep ImageNet classification models. We identify numerous and previously unknown factors that affect uncertainty estimation and examine the relationships between the different metrics. We find that distillation-based training regimes consistently yield better uncertainty estimations than other training schemes such as vanilla training, pretraining on a larger dataset and adversarial training. We also provide strong empirical evidence showing that ViT is by far the most superior architecture in terms of uncertainty estimation performance, judging by any aspect, in both in-distribution and class-out-of-distribution scenarios. We learn various interesting facts along the way. Contrary to previous work, ECE does not necessarily worsen with an increase in the number of network parameters. Likewise, we discovered an unprecedented 99% top-1 selective accuracy at 47% coverage (and 95% top-1 accuracy at 80%) for a ViT model, whereas a competing EfficientNet-V2-XL cannot obtain these accuracy constraints at any level of coverage.

One-sentence Summary: Analyzing 484 deep neural models for ImageNet classification, we determine the best architectures and training regimes leading to superior uncertainty estimation

Supplementary Material: zip

20 Replies

Loading