Towards GAN Benchmarks Which Require Generalization

Ishaan Gulrajani; Colin Raffel; Luke Metz

Towards GAN Benchmarks Which Require Generalization

Ishaan Gulrajani, Colin Raffel, Luke Metz

Published: 21 Dec 2018, Last Modified: 22 Jun 2025ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be ``won'' by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation, implement an example black-box metric based on these ideas, and validate experimentally that it can measure a notion of generalization.

Keywords: evaluation, generative adversarial networks, adversarial divergences

TL;DR: We argue that GAN benchmarks must require a large sample from the model to penalize memorization and investigate whether neural network divergences have this property.

Data: [ImageNet](https://paperswithcode.com/dataset/imagenet)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/towards-gan-benchmarks-which-require/code)

8 Replies

Loading