Exploiting Safe Spots in Neural Networks for Preemptive Robustness and Out-of-Distribution Detection

Seungyong Moon; Gaon An; Hyun Oh Song

Exploiting Safe Spots in Neural Networks for Preemptive Robustness and Out-of-Distribution Detection

Seungyong Moon, Gaon An, Hyun Oh Song

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: adversarial defense, out-of-distribution detection

Abstract: Recent advances on adversarial defense mainly focus on improving the classifier’s robustness against adversarially perturbed inputs. In this paper, we turn our attention from classifiers to inputs and explore if there exist safe spots in the vicinity of natural images that are robust to adversarial attacks. In this regard, we introduce a novel bi-level optimization algorithm that can find safe spots on over 90% of the correctly classified images for adversarially trained classifiers on CIFAR-10 and ImageNet datasets. Our experiments also show that they can be used to improve both the empirical and certified robustness on smoothed classifiers. Furthermore, by exploiting a novel safe spot inducing model training scheme and our safe spot generation method, we propose a new out-of-distribution detection algorithm which achieves the state of the art results on near-distribution outliers.

One-sentence Summary: We define a new problem on adversarial robustness of neural networks, named preemptive robustness, and develop a novel algorithm to improve the robustness.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=hkiG4wijn

20 Replies

Loading