Evaluating Robustness to Unforeseen Adversarial Attacks

Maximilian Kaufmann; Daniel Kang; Yi Sun; Xuwang Yin; Steven Basart; Mantas Mazeika; Adam Dziedzic; Akul Arora; Franziska Boenisch; Tom B Brown; Abhinav Kommula; Oliver Zhang; Jacob Steinhardt; Dan Hendrycks

Evaluating Robustness to Unforeseen Adversarial Attacks

Maximilian Kaufmann, Daniel Kang, Yi Sun, Xuwang Yin, Steven Basart, Mantas Mazeika, Adam Dziedzic, Akul Arora, Franziska Boenisch, Tom B Brown, Abhinav Kommula, Oliver Zhang, Jacob Steinhardt, Dan Hendrycks

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: ML safety, adversarial robustness, distribution shift, unforeseen adversaries

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A new benchmark for evaluating the robustness of neural networks to adversaries not seen during training.

Abstract: When considering real-world adversarial settings, defenders are unlikely to have access to the full range of deployment-time adversaries during training, and adversaries are likely to use realistic adversarial distortions that will not be limited to small $L_p$-constrained perturbations. To narrow in on this discrepancy between research and reality we introduce eighteen novel adversarial attacks, which we use to create ImageNet-UA, a new benchmark for evaluating model robustness against a wide range of unforeseen adversaries. We make use of our benchmark to identify a range of defense strategies which can help overcome this generalization gap, finding a rich space of techniques which can improve unforeseen robustness. We hope the greater variety and realism of ImageNetUA will make it a useful tool for those working on real-world worst-case robustness, enabling development of more robust defenses which can generalize beyond attacks seen during training.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7760

Loading