Adversarial robustness against multiple $l_p$-threat models at the price of one and how to quickly fine-tune robust models to another threat modelDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: adversarial robustness, multiple norms, adversarial training, fine-tuning
Abstract: Adversarial training (AT) in order to achieve adversarial robustness wrt single $l_p$-threat models has been discussed extensively. However, for safety-critical systems adversarial robustness should be achieved wrt all $l_p$-threat models simultaneously. In this paper we develop a simple and efficient training scheme to achieve adversarial robustness against the union of $l_p$-threat models. Our novel E-AT scheme is based on geometric considerations of the different $l_p$-balls and costs as much as normal adversarial training against a single $l_p$-threat model. Moreover, we show that using our E-AT scheme one can fine-tune with just 3 epochs \emph{any} $l_p$-robust model (for $p \in \{1,2,\infty\}$) and achieve multiple norm adversarial robustness. In this way we boost the state-of-the-art for multiple-norm robustness to more than $51\%$ on CIFAR-10 and report up to our knowledge the first ImageNet models with multiple norm robustness. Moreover, we study the general transfer of adversarial robustness between different threat models and in this way boost the previous SOTA $l_1$-robustness on CIFAR-10 by almost $10\%$.
One-sentence Summary: We propose a version of adversarial training for fast multiple norms robustness, and transfer robustness among threat models via fine-tuning.
Supplementary Material: zip
20 Replies

Loading