One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features

Stephen Casper; Max Nadeau; Gabriel Kreiman

One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features

Stephen Casper, Max Nadeau, Gabriel Kreiman

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: adversaries, interpretablity, generative modeling

Abstract: It is well understood that modern deep networks are vulnerable to adversarial attacks. However, conventional methods fail to produce adversarial perturbations that are intelligible to humans, and they pose limited threats in the physical world. To study feature-class associations in networks and better understand the real-world threats they face, we develop feature-level adversarial perturbations using deep image generators and a novel optimization objective. We term these feature-fool attacks. We show that they are versatile and use them to generate targeted feature-level attacks at the ImageNet scale that are simultaneously interpretable, universal to any source image, and physically-realizable. These attacks can also reveal spurious, semantically-describable feature/class associations, and we use them to guide the design of ``copy/paste'' adversaries in which one natural image is pasted into another to cause a targeted misclassification.

One-sentence Summary: Using generative models to create interpretable and versatile adversarial features.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/one-thing-to-fool-them-all-generating/code)

14 Replies

Loading