Mahalanobis and max-softmax : but why? A comprehensive study of the benchmark scores in Adversarial Attacks Detection
Abstract: Transformers (Vaswani et al., 2017) and other Deep Learning architectures have gained a lot of traction lately, as we have seen with the public release of Chat-GPT3 (Brown et al., 2020).
Although highly performant, those black-box models are questionned on their robustness, which will condition their use on sensible tasks. With their democratization, adversarial attacks have become a growing concern.
The goal of this article is to study popular Adversarial Attack detection scores, mainly the max-softmax (Hendrycks and Gimpel, 2018), and a Mahalanobis distance-based score (Yoo et al., 2022); we will attempt to measure both their performances and limitations.
To this end, we introduce two scores : FtS (first-to-second) and Euclidian, the first is based on the softmax output of the classifier, while the second uses its penultimate layer's output. Those scores will respectively attempt to challenge the max-softmax and the Mahalanobis-based scores.
0 Replies
Loading