- Abstract: Deep neural networks provide state-of-the-art performance for many applications of interest. Unfortunately they are known to be vulnerable to adversarial examples, formed by applying small but malicious perturbations to the original inputs. Moreover, the perturbations can transfer across models: adversarial examples generated for a specific model will often mislead other unseen models. Consequently the adversary can leverage it to attack against the deployed black-box systems. In this work, we demonstrate that the adversarial perturbation can be decomposed into two components: model-specific and data-dependent one, and it is the latter that mainly contributes to the transferability. Motivated by this understanding, we propose to craft adversarial examples by utilizing the noise reduced gradient (NRG) which approximates the data-dependent component. Experiments on various classification models trained on ImageNet demonstrates that the new approach enhances the transferability dramatically. We also find that low-capacity models have more powerful attack capability than high-capacity counterparts, under the condition that they have comparable test performance. These insights give rise to a principled manner to construct adversarial examples with high success rates and could potentially provide us guidance for designing effective defense approaches against black-box attacks.
- TL;DR: We propose a new method for enhancing the transferability of adversarial examples by using the noise-reduced gradient.
- Keywords: black-box attack, adversarial example, deep learning, transferability