Buffer Zone based Defense against Adversarial Examples in Image ClassificationDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Recent defenses published at venues like NIPS, ICML, ICLR and CVPR are mainly focused on mitigating white-box attacks. These defenses do not properly consider adaptive adversaries. In this paper, we expand the scope of these defenses to include adaptive black-box adversaries. Based on our study of these defenses, we develop three contributions. First we propose a new metric for evaluating adversarial robustness when clean accuracy is impacted. Second, we create an enhanced adaptive black-box attack. Third and most significantly, we develop a novel defense against these adaptive black-box attacks. Our defense is based on a combination of deep neural networks and simple image transformations. While straight forward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We verify our claims through extensive experimentation. Our results encompass three adversarial models (10 different black-box attacks) on 11 defenses with two datasets (CIFAR-10 and Fashion-MNIST).
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=j3WpvwhIG_1
5 Replies

Loading