Adversarial examples for generative models

Jernej Kos; Ian Fischer; Dawn Song

Adversarial examples for generative models

Jernej Kos, Ian Fischer, Dawn Song

06 Jul 2025 (modified: 22 Jun 2025)Submitted to ICLR 2017Readers: Everyone

TL;DR: Exploration of adversarial examples against latent space generative models on multiple datasets.

Abstract: We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present two classes of attacks on the VAE-GAN architecture and demonstrate them against networks trained on MNIST, SVHN, and CelebA. Our first attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our second attack moves beyond relying on the standard loss for computing the gradient and directly optimizes against differences in source and target latent representations. We additionally present an interesting visualization, which gives insight into how adversarial examples appear in generative models.

Keywords: Deep learning

Conflicts: nus.edu.sg, google.com, cs.berkeley.edu

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/adversarial-examples-for-generative-models/code)

4 Replies

Loading