Abstract: Adversarial examples remain an issue for contemporary neural networks. This paper draws on Background Check (Perello-Nieto et al., 2016), a technique in model calibration, to assist two-class neural networks in detecting adversarial examples, using the one dimensional difference between logit values as the underlying measure. This method interestingly tends to achieve the highest average recall on image sets that are generated with large perturbation vectors, which is unlike the existing literature on adversarial attacks (Cubuk et al., 2017). The proposed method does not need knowledge of the attack parameters or methods at training time, unlike a great deal of the literature that uses deep learning based methods to detect adversarial examples, such as Metzen et al. (2017), imbuing the proposed method with additional flexibility.
Keywords: Adversarial attacks, calibration, probability, adversarial defence
TL;DR: This paper uses principles from the field of calibration in machine learning on the logits of a neural network to defend against adversarial attacks
9 Replies
Loading