Perturbation Type Categorization for Multiple $\ell_p$ Bounded Adversarial RobustnessDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: adversarial examples, robustness, multiple perturbation types
Abstract: Despite the recent advances in $\textit{adversarial training}$ based defenses, deep neural networks are still vulnerable to adversarial attacks outside the perturbation type they are trained to be robust against. Recent works have proposed defenses to improve the robustness of a single model against the union of multiple perturbation types. However, when evaluating the model against each individual attack, these methods still suffer significant trade-offs compared to the ones specifically trained to be robust against that perturbation type. In this work, we introduce the problem of categorizing adversarial examples based on their $\ell_p$ perturbation types. Based on our analysis, we propose $\textit{PROTECTOR}$, a two-stage pipeline to improve the robustness against multiple perturbation types. Instead of training a single predictor, $\textit{PROTECTOR}$ first categorizes the perturbation type of the input, and then utilizes a predictor specifically trained against the predicted perturbation type to make the final prediction. We first theoretically show that adversarial examples created by different perturbation types constitute different distributions, which makes it possible to distinguish them. Further, we show that at test time the adversary faces a natural trade-off between fooling the perturbation type classifier and the succeeding predictor optimized with perturbation specific adversarial training. This makes it challenging for an adversary to plant strong attacks against the whole pipeline. In addition, we demonstrate the realization of this trade-off in deep networks by adding random noise to the model input at test time, enabling enhanced robustness against strong adaptive attacks. Extensive experiments on MNIST and CIFAR-10 show that $\textit{PROTECTOR}$ outperforms prior adversarial training based defenses by over $5\%$, when tested against the union of $\ell_1, \ell_2, \ell_\infty$ attacks.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We introduce a method that performs Perturbation Type Categorization for Robustness against multiple perturbation types
Reviewed Version (pdf): https://openreview.net/references/pdf?id=xRbXv6cd6D
13 Replies

Loading