SPARDACUS SafetyCage: A new misclassification detector

Pål Vegard Johnsen; Filippo Remonato; Shawn Benedict; Albert Ndur-Osei

SPARDACUS SafetyCage: A new misclassification detector

Pål Vegard Johnsen, Filippo Remonato, Shawn Benedict, Albert Ndur-Osei

Published: 06 Nov 2024, Last Modified: 06 Jan 2025NLDL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Misclassification detection, uncertainty estimation, Wasserstein distance, hypothesis tests

TL;DR: This work introduces a misclassification detector for neural network architectures called SPARDACUS. The detector is on par or slightly better than SOTA-methods, but has several properties that make it more flexible and more powerful.

Abstract: Given the increasing adoption of machine learning techniques in society and industry, it is important to put procedures in place that can infer and signal whether the prediction of an ML model may be unreliable. This is not only relevant for ML specialists, but also for laypersons who may be end-users. In this work, we present a new method for flagging possible misclassifications from a feed-forward neural network in a general multi-class problem, called SPARDA-enabled Classification Uncertainty Scorer (SPARDACUS). For each class and layer, the probability distribution functions of the activations for both correctly and wrongly classified samples are recorded. Using a Sparse Difference Analysis (SPARDA) approach, an optimal projection along the direction maximizing the Wasserstein distance enables $p$-value computations to confirm or reject the class prediction. Importantly, while most existing methods act on the output layer only, our method can in addition be applied on the hidden layers in the neural network, thus being useful in applications, such as feature extraction, that necessarily exploit the intermediate (hidden) layers. We test our method on both a well-performing and under-performing classifier, on different datasets, and compare with other previously published approaches. Notably, while achieving performance on par with two state-of-the-art-level methods, we significantly extend in flexibility and applicability. We further find, for the models and datasets chosen, that the output layer is indeed the most valuable for misclassification detection, and adding information from previous layers does not necessarily improve performance in such cases.

Submission Number: 29

Loading