Toward Efficient Robust Training against Union of $\ell_p$ Threat ModelsDownload PDF

Published: 31 Oct 2022, Last Modified: 12 Oct 2022NeurIPS 2022 AcceptReaders: Everyone
Keywords: Adversarial Robustness, Adversarial Defense, Adversarial Training, Multiple Threat Models, Fast Adversarial Training, Efficient Adversarial Training, Single-Step Adversarial Training
Abstract: The overwhelming vulnerability of deep neural networks to carefully crafted perturbations known as adversarial attacks has led to the development of various training techniques to produce robust models. While the primary focus of existing approaches has been directed toward addressing the worst-case performance achieved under a single-threat model, it is imperative that safety-critical systems are robust with respect to multiple threat models simultaneously. Existing approaches that address worst-case performance under the union of such threat models ($\ell_{\infty}, \ell_2, \ell_1$) either utilize adversarial training methods that require multi-step attacks which are computationally expensive in practice, or rely upon fine-tuning of pre-trained models that are robust with respect to a single-threat model. In this work, we show that by carefully choosing the objective function used for robust training, it is possible to achieve similar, or improved worst-case performance over a union of threat models while utilizing only single-step attacks, thereby achieving a significant reduction in computational resources necessary for training. Furthermore, prior work showed that adversarial training specific to the $\ell_1$ threat model is relatively difficult, to the extent that even multi-step adversarially trained models were shown to be prone to gradient-masking. However, the proposed method—when applied on the $\ell_1$ threat model specifically—enables us to obtain the first $\ell_1$ robust model trained solely with single-step adversaries. Finally, to demonstrate the merits of our approach, we utilize a modern set of attack evaluations to better estimate the worst-case performance under the considered union of threat models.
Supplementary Material: pdf
18 Replies