Adversarial examples for generative models

Jernej Kos, Ian Fischer, Dawn Song

Feb 17, 2017 (modified: Mar 08, 2017) ICLR 2017 workshop submission readers: everyone
  • Abstract: We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present two classes of attacks on the VAE-GAN architecture and demonstrate them against networks trained on MNIST, SVHN, and CelebA. Our first attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our second attack moves beyond relying on the standard loss for computing the gradient and directly optimizes against differences in source and target latent representations. We additionally present an interesting visualization, which gives insight into how adversarial examples appear in generative models.
  • TL;DR: Exploration of adversarial examples against latent space generative models on multiple datasets.
  • Keywords: Deep learning
  • Conflicts:,,