Adversarial examples for generative modelsDownload PDF

23 Nov 2024 (modified: 22 Oct 2023)Submitted to ICLR 2017Readers: Everyone
TL;DR: Exploration of adversarial examples against latent space generative models on multiple datasets.
Abstract: We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present two classes of attacks on the VAE-GAN architecture and demonstrate them against networks trained on MNIST, SVHN, and CelebA. Our first attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our second attack moves beyond relying on the standard loss for computing the gradient and directly optimizes against differences in source and target latent representations. We additionally present an interesting visualization, which gives insight into how adversarial examples appear in generative models.
Keywords: Deep learning
Conflicts: nus.edu.sg, google.com, cs.berkeley.edu
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:1702.06832/code)
4 Replies

Loading