SAD: Saliency Adversarial Defense without Adversarial TrainingDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Adversarial Robustness, Saliency Maps, Interpretability
Abstract: Adversarial training is one of the most effective methods for defending adversarial attacks, but it is computationally costly. In this paper, we propose Saliency Adversarial Defense (SAD), an efficient defense algorithm that avoids adversarial training. The saliency map is added to the input with a hybridization ratio to enhance those pixels that are important for making decisions. This process causes a distribution shift to the original data. Interestingly, we find that this shift can be effectively fixed by updating the statistics of batch normalization with the processed data without further training. We justify the algorithm with a linear model that the added saliency maps pull data away from its closest decision boundary. Updating BN effectively evolves the decision boundary to fit the new data. As a result, the distance between the decision boundary and the original inputs are increased such that the model is able to defend stronger attacks and thus improve robustness. Then we show in experiments that the results still hold for complex models and datasets. Our results demonstrate that SAD is superior in defending various attacks, including both white-box and black-box ones.
One-sentence Summary: We propose an interpretable defense method by processing data with saliency maps and then updating the statistics of batch normalization with the processed data, outperforming adversarial training.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=iyYr4NLRi
4 Replies

Loading