Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Francesco Croce; Matthias Hein

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Francesco Croce, Matthias Hein

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We introduce a white-box adversarial attack wrt the $l_1$-, $l_2$- and $l_\infty$-norm achieving state-of-the-art performances, minimizing the norm of the perturbations and being computationally cheap.

Abstract: The evaluation of robustness against adversarial manipulations of neural networks-based classifiers is mainly tested with empirical attacks as the methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similarly to state-of-the-art attacks which are partially specialized to one $l_p$-norm.

Keywords: adversarial attacks, adversarial robustness

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/minimally-distorted-adversarial-examples-with/code)

Original Pdf: pdf

7 Replies

Loading