- Abstract: Deep neural networks are vulnerable to adversarial examples, which can mislead classifiers by adding imperceptible perturbations. An intriguing property of adversarial examples is their good transferability, making black-box attacks feasible in real-world applications. Due to the threat of adversarial attacks, many methods have been proposed to improve the robustness, and several state-of-the-art defenses are shown to be robust against transferable adversarial examples. In this paper, we identify the attention shift phenomenon, which may hinder the transferability of adversarial examples to the defense models. It indicates that the defenses rely on different discriminative regions to make predictions compared with normally trained models. Therefore, we propose an attention-invariant attack method to generate more transferable adversarial examples. Extensive experiments on the ImageNet dataset validate the effectiveness of the proposed method. Our best attack fools eight state-of-the-art defenses at an 82% success rate on average based only on the transferability, demonstrating the insecurity of the defense techniques.
- Keywords: adversarial examples, black-box attack, transferability
- TL;DR: We propose an attention-invariant attack method to generate more transferable adversarial examples for black-box attacks, which can fool state-of-the-art defenses with a high success rate.