Non-robust Features through the Lens of Universal PerturbationsDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: adversarial examples, robustness, non-robust features
Abstract: Recent work ties adversarial examples to existence of non-robust features: features which are susceptible to small perturbations and believed to be unintelligible to humans, but still useful for prediction. We study universal adversarial perturbations and demonstrate that the above picture is more nuanced. Specifically, even though universal perturbations---similarly to standard adversarial perturbations---do leverage non-robust features, these features tend to be fundamentally different from the ``standard'' ones and, in particular, non-trivially human-aligned. Namely, universal perturbations have more human-aligned locality and spatial invariance properties. However, we also show that these human-aligned non-robust features have much less predictive signal than general non-robust features. Our findings thus take a step towards improving our understanding of these previously unintelligible features.
One-sentence Summary: We analyze non-robust features through universal perturbations, and find evidence of weak yet human-aligned non-robust features.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=Nm0b8puMi
14 Replies

Loading