Transferable Perturbations of Deep Feature Distributions

Nathan Inkawhich; Kevin Liang; Lawrence Carin; Yiran Chen

Transferable Perturbations of Deep Feature Distributions

Nathan Inkawhich, Kevin Liang, Lawrence Carin, Yiran Chen

Published: 20 Dec 2019, Last Modified: 12 Oct 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-specific to model-specific features within a CNN architecture that directly impacts the transferability of adversarial examples.

Keywords: adversarial attacks, transferability, interpretability

TL;DR: We show that perturbations based-on intermediate feature distributions yield more transferable adversarial examples and allow for analysis of the affects of adversarial perturbations on intermediate representations.

Data: [ImageNet](https://paperswithcode.com/dataset/imagenet)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/transferable-perturbations-of-deep-feature/code)

Original Pdf: pdf

10 Replies

Loading