How Aggressive Can Adversarial Attacks Be: Learning Ordered Top-k Attacks

Sep 25, 2019 Withdrawn Submission readers: everyone
• Abstract: Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, especially white-box targeted attacks. This paper studies the problem of how aggressive white-box targeted attacks can be to go beyond widely used Top-1 attacks. We propose to learn ordered Top-k attacks (k>=1), which enforce the Top-k predicted labels of an adversarial example to be the k (randomly) selected and ordered labels (the ground-truth label is exclusive). Two methods are presented. First, we extend the vanilla Carlini-Wagner (C&W) method and use it as a strong baseline. Second, we present an adversarial distillation framework consisting of two components: (i) Computing an adversarial probability distribution for any given ordered Top-$k$ targeted labels. (ii) Learning adversarial examples by minimizing the Kullback-Leibler (KL) divergence between the adversarial distribution and the predicted distribution, together with the perturbation energy penalty. In computing adversarial distributions, we explore how to leverage label semantic similarities, leading to knowledge-oriented attacks. In experiments, we test Top-k (k=1,2,5,10) attacks in the ImageNet-1000 val dataset using two popular DNNs trained with the clean ImageNet-1000 train dataset, ResNet-50 and DenseNet-121. Overall, the adversarial distillation approach obtains the best results, especially by large margin when computation budget is limited.. It reduces the perturbation energy consistently with the same attack success rate on all the four k's, and improve the attack success rate by large margin against the modified C&W method for k=10.