SafetyCage: A misclassification detector for feed-forward neural networks

Published: 03 Nov 2023, Last Modified: 23 Dec 2023NLDL 2024EveryoneRevisionsBibTeX
Keywords: Misclassification detection, neural networks, Mahalanobis distance
TL;DR: We propose a misclassification detection procedure based on a hypothesis test given the multivariate probability distributions of the pre-activation values in a MLP. The p-values are used to accept or reject the classification of the input sample.
Abstract: Deep learning classifiers have reached state-of-the-art performance in many fields, particularly so image classification. Wrong class assignment by the classifiers can often be inconsequential when distinguishing pictures of cats and dogs, but in more critical operations like autonomous driving vehicles or process control in industry, wrong classifications can lead to disastrous events. While reducing the error rate of the classifier is of primary importance, it is impossible to completely remove it. Having a system that is able to flag wrong or suspicious classifications is therefore a necessary component for safety and robustness in operations. In this work, we present a general statistical inference framework for detection of misclassifications. We test our approach on two well-known benchmark datasets: MNIST and CIFAR-10. We show that, given the underlying classifier is well trained, SafetyCage is effective at flagging wrong classifications. We also include a detailed discussion of the drawbacks, and what can be done to improve the approach.
Permission: pdf
Submission Number: 12
Loading