Keywords: AI Safety, ML safety, adversarial robustness, distribution shift, unforeseen adversaries
TL;DR: A new benchmark for evaluating the robustness of neural networks to adversaries not seen during training.
Abstract: When considering real-world adversarial settings, defenders are unlikely to have access to the full range of deployment-time adversaries, and adversaries are likely to use realistic adversarial distortions that will not be limited to small $L_p$-constrained perturbations. To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a new benchmark for evaluating model robustness against a wide range of unforeseen adversaries. We make use of our benchmark to identify holes in current popular adversarial defense techniques, highlighting a rich space of techniques which can improve unforeseen robustness. We hope the greater variety and realism of ImageNet-UA will make it a useful tool for those working on real-world worst-case robustness, enabling development of more robust defenses which can generalize beyond attacks seen during training.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11827
Loading