SELF-KNOWLEDGE DISTILLATION ADVERSARIAL ATTACK

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Neural networks show great vulnerability under the threat of adversarial examples. By adding small perturbation to a clean image, neural networks with high classification accuracy can be completely fooled. One intriguing property of the adversarial examples is transferability. This property allows adversarial examples to transfer to networks of unknown structure, which is harmful even to the physical world. The current way of generating adversarial examples is mainly divided into optimization based and gradient based methods. Liu et al. (2017) conjecture that gradient based methods can hardly produce transferable targeted adversarial examples in black-box-attack. However, in this paper, we use a simple technique to improve the transferability and success rate of targeted attacks with gradient based methods. We prove that gradient based methods can also generate transferable adversarial examples in targeted attacks. Specifically, we use knowledge distillation for gradient based methods, and show that the transferability can be improved by effectively utilizing different classes of information. Unlike the usual applications of knowledge distillation, we did not train a student network to generate adversarial examples. We take advantage of the fact that knowledge distillation can soften the target and obtain higher information, and combine the soft target and hard target of the same network as the loss function. Our method is generally applicable to most gradient based attack methods.
  • Keywords: Adversarial Examples, Transferability, black-box targeted attack, Distillation
0 Replies

Loading