- Abstract: Deep neural networks (DNNs) have been shown to be brittle to inputs outside the distribution of training data, and to adversarial examples. This fragility is compounded by a lack of effectively computable measures of prediction confidence that correlate with the accuracy of DNNs. The direct use of logits severely overestimates confidence. These factors have impeded the adoption of DNNs in high-assurance systems. In this paper, we propose a novel confidence metric that does not require access to the training data, the use of model ensembles, or the need to train a calibration model on a held-out validation set, and hence, is usable even when only a trained model is available at inference time. A lightweight approach to quantify uncertainty in the output of a model and define a confidence metric is to measure the conformance of the model's decision in the neighborhood of the input. But measuring conformance by sampling in the neighborhood of an input becomes exponentially difficult with increase in the dimension of the input. We use the feature concentration observed in robust models for local dimensionality reduction and attribution-based sampling over the features to compute the confidence metric. We mathematically motivate the proposed metric, and evaluate its effectiveness with two sets of experiments. First, we study the change in accuracy and the associated confidence over out-of-distribution inputs and evaluate the correlation between the accuracy and computed confidence. We also compare our results with the use of logits to estimate uncertainty. Second, we consider attacks such as FGSM, CW, DeepFool, PGD, and adversarial patch generation methods. The computed confidence metric is found to be low on out-of-distribution data and adversarial examples where the accuracy of the model is also low. These experiments demonstrate the effectiveness of the proposed confidence metric to make DNNs more transparent with respect to uncertainty of prediction.
- Code Link: https://github.com/trinityQ
- CMT Num: 6347