Keywords: machine learning, adversarial machine learning, computer vision, adversarial robustness, adversarial attacks
TL;DR: Minimum-norm gradient-based adversarial attack that works with multiple $\ell_p$ norms.
Abstract: Evaluating adversarial robustness amounts to finding the minimum perturbation needed to have an input sample misclassified. The inherent complexity of the underlying optimization requires current gradient-based attacks to be carefully tuned, initialized, and possibly executed for many computationally-demanding iterations, even if specialized to a given perturbation model. In this work, we overcome these limitations by proposing a fast minimum-norm (FMN) attack that works with different $\ell_p$-norm perturbation models ($p=0, 1, 2, \infty$), is robust to hyperparameter choices, does not require adversarial starting points, and converges within few lightweight steps. It works by iteratively finding the sample misclassified with maximum confidence within an $\ell_p$-norm constraint of size $\epsilon$, while adapting $\epsilon$ to minimize the distance of the current sample to the decision boundary. Extensive experiments show that FMN significantly outperforms existing $\ell_0$, $\ell_1$, and $\ell_\infty$-norm attacks in terms of perturbation size, convergence speed and computation time, while reporting comparable performances with state-of-the-art $\ell_2$-norm attacks. Our open-source code is available at: https://github.com/pralab/Fast-Minimum-Norm-FMN-Attack.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.