- Keywords: Adversarial Examples, Adversarial Robustness, Generative Adversarial Networks, Image Classification
- TL;DR: Training GANs to generate unrestricted adversarial examples
- Abstract: Neural networks are vulnerable to adversarially-constructed perturbations of their inputs. Most research so far has considered perturbations of a fixed magnitude under some $l_p$ norm. Although studying these attacks is valuable, there has been increasing interest in the construction of—and robustness to—unrestricted attacks, which are not constrained to a small and rather artificial subset of all possible adversarial inputs. We introduce a novel algorithm for generating such unrestricted adversarial inputs which, unlike prior work, is adaptive: it is able to tune its attacks to the classifier being targeted. It also offers a 400–2,000× speedup over the existing state of the art. We demonstrate our approach by generating unrestricted adversarial inputs that fool classifiers robust to perturbation-based attacks. We also show that, by virtue of being adaptive and unrestricted, our attack is able to bypass adversarial training against it.