Generating Adversarial Examples with Adversarial Networks


Nov 07, 2017 (modified: Nov 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Recently deep neural networks (DNNs) have been found to be vulnerable against adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them more efficiently and guarantee the diversity of adversarial perturbations requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and preserve the distribution of original instances. For AdvGAN, once the generator is trained, it can generate an adversarial perturbation efficiently for any instance, and the generated adversarial examples have large variety depending on the underlying distribution, so as to accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In the black-box attack, we dynamically train a distilled model for the black-box and optimize the generator accordingly. Extensive experimental results show that black-box attacks based on AdvGAN can achieve comparable attack success rate with that of semi-whitebox settings. Adversarial examples generated by AdvGAN on different models have high attack success rate under state-of-the-art defenses compared with other attacks. We have achieved 92.76% accuracy on the MNIST black-box challenge and been ranked at the top position.
  • TL;DR: We propose to generate adversarial example based on generative adversarial networks in a semi-whitebox and black-box settings.
  • Keywords: adversarial examples, generative adversarial network, black-box attack