- TL;DR: We propose several new attacks and a methodology to measure robustness against unforeseen adversarial attacks.
- Abstract: Most existing defenses against adversarial attacks only consider robustness to L_p-bounded distortions. In reality, the specific attack is rarely known in advance and adversaries are free to modify images in ways which lie outside any fixed distortion model; for example, adversarial rotations lie outside the set of L_p-bounded distortions. In this work, we advocate measuring robustness against a much broader range of unforeseen attacks, attacks whose precise form is unknown during defense design. We propose several new attacks and a methodology for evaluating a defense against a diverse range of unforeseen distortions. First, we construct novel adversarial JPEG, Fog, Gabor, and Snow distortions to simulate more diverse adversaries. We then introduce UAR, a summary metric that measures the robustness of a defense against a given distortion. Using UAR to assess robustness against existing and novel attacks, we perform an extensive study of adversarial robustness. We find that evaluation against existing L_p attacks yields redundant information which does not generalize to other attacks; we instead recommend evaluating against our significantly more diverse set of attacks. We further find that adversarial training against either one or multiple distortions fails to confer robustness to attacks with other distortion types. These results underscore the need to evaluate and study robustness against unforeseen distortions.
- Code: https://github.com/iclr-2020-submission/advex-uar
- Keywords: adversarial examples, adversarial training, adversarial attacks