Evaluating Model Robustness Against Unforeseen Adversarial Attacks

Maximilian Kaufmann; Daniel Kang; Yi Sun; Xuwang Yin; Steven Basart; Mantas Mazeika; Adam Dziedzic; Franziska Boenisch; Tom B Brown; Abhinav Kommula; Oliver Zhang; Akul Arora; Jacob Steinhardt; Dan Hendrycks

Evaluating Model Robustness Against Unforeseen Adversarial Attacks

Maximilian Kaufmann, Daniel Kang, Yi Sun, Xuwang Yin, Steven Basart, Mantas Mazeika, Adam Dziedzic, Franziska Boenisch, Tom B Brown, Abhinav Kommula, Oliver Zhang, Akul Arora, Jacob Steinhardt, Dan Hendrycks

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Safety, ML safety, adversarial robustness, distribution shift, unforeseen adversaries

TL;DR: A new benchmark for evaluating the robustness of neural networks to adversaries not seen during training.

Abstract: When considering real-world adversarial settings, defenders are unlikely to have access to the full range of deployment-time adversaries, and adversaries are likely to use realistic adversarial distortions that will not be limited to small $L_p$-constrained perturbations. To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a new benchmark for evaluating model robustness against a wide range of unforeseen adversaries. We make use of our benchmark to identify holes in current popular adversarial defense techniques, highlighting a rich space of techniques which can improve unforeseen robustness. We hope the greater variety and realism of ImageNet-UA will make it a useful tool for those working on real-world worst-case robustness, enabling development of more robust defenses which can generalize beyond attacks seen during training.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11827

Loading