Abstract: Deep neural networks have shown their vulnerabilities to adversarial examples crafted by adding imperceptible perturbations to original examples. Despite showing powerful attack strength under the white-box setting, most existing adversarial attack methods can only mislead the black-box model with low attack success rates. In response, a class of image transformation-based attacks has been proposed. Its main idea is to apply transformations to adversarial examples during attack iterations and improve the transferability on the black-box model. However, a major limitation of these transformation-based attacks is that they only apply transformations to input images, while ignoring transformations’ usages in hidden representations. Based on our observation that mixup in hidden space can help attack methods achieve higher transferability than in input space, we propose the Random-Layer Mixup Attack Method (RLMAM). Our method interpolates the adversarial examples with clean examples in both input space and hidden space. The interpolated adversarial representations induced by our random-layer mixup can improve representations’ diversity in both two spaces and alleviate adversarial examples’ overfitting phenomenon on the white-box model. Furthermore, we incorporate RLMAM with our enhanced momentum method. Experimental results on ImageNet and CIFAR-10 datasets demonstrate that our RLMAM outperforms other state-of-the-art black-box attacks.
0 Replies
Loading