Calibration of neural network logit vectors to combat adversarial attacks

Oliver Goldstein

Calibration of neural network logit vectors to combat adversarial attacks

Oliver Goldstein

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Adversarial examples remain an issue for contemporary neural networks. This paper draws on Background Check (Perello-Nieto et al., 2016), a technique in model calibration, to assist two-class neural networks in detecting adversarial examples, using the one dimensional difference between logit values as the underlying measure. This method interestingly tends to achieve the highest average recall on image sets that are generated with large perturbation vectors, which is unlike the existing literature on adversarial attacks (Cubuk et al., 2017). The proposed method does not need knowledge of the attack parameters or methods at training time, unlike a great deal of the literature that uses deep learning based methods to detect adversarial examples, such as Metzen et al. (2017), imbuing the proposed method with additional flexibility.

Keywords: Adversarial attacks, calibration, probability, adversarial defence

TL;DR: This paper uses principles from the field of calibration in machine learning on the logits of a neural network to defend against adversarial attacks

9 Replies

Loading