Mahalanobis and max-softmax : but why? A comprehensive study of the benchmark scores in Adversarial Attacks Detection

Raphaël Thabut, Eric Vong

19 Mar 2023 (modified: 21 Mar 2023)OpenReview Archive Direct UploadReaders: Everyone

Abstract: Transformers (Vaswani et al., 2017) and other Deep Learning architectures have gained a lot of traction lately, as we have seen with the public release of Chat-GPT3 (Brown et al., 2020). Although highly performant, those black-box models are questionned on their robustness, which will condition their use on sensible tasks. With their democratization, adversarial attacks have become a growing concern. The goal of this article is to study popular Adversarial Attack detection scores, mainly the max-softmax (Hendrycks and Gimpel, 2018), and a Mahalanobis distance-based score (Yoo et al., 2022); we will attempt to measure both their performances and limitations. To this end, we introduce two scores : FtS (first-to-second) and Euclidian, the first is based on the softmax output of the classifier, while the second uses its penultimate layer's output. Those scores will respectively attempt to challenge the max-softmax and the Mahalanobis-based scores.

0 Replies