Abstract: Generative Adversarial Networks (GANs), when trained on large datasets with diverse modes, are known to produce conflated images which do not distinctly belong to any of the modes. We hypothesize that this problem occurs due to the interaction between two facts: (1) For datasets with large variety, it is likely that the modes lie on separate manifolds. (2) The generator (G) is formulated as a continuous function, and the input noise is derived from a connected set, due to which G's output is a connected set. If G covers all modes, then there must be some portion of G's output which connects them. This corresponds to undesirable, conflated images. We develop theoretical arguments to support these intuitions. We propose a novel method to break the second assumption via learnable discontinuities in the latent noise space. Equivalently, it can be viewed as training several generators, thus creating discontinuities in the G function. We also augment the GAN formulation with a classifier C that predicts which noise partition/generator produced the output images, encouraging diversity between each partition/generator. We experiment on MNIST, celebA, STL-10, and a difficult dataset with clearly distinct modes, and show that the noise partitions correspond to different modes of the data distribution, and produce images of superior quality.
TL;DR: We introduce theory to explain the failure of GANs on complex datasets and propose a solution to fix it.
Keywords: generative adversarial networks, GANs, deep learning, unsupervised learning, generative models, adversarial learning
Data: [CelebA](https://paperswithcode.com/dataset/celeba), [ImageNet](https://paperswithcode.com/dataset/imagenet), [LSUN](https://paperswithcode.com/dataset/lsun), [STL-10](https://paperswithcode.com/dataset/stl-10)
7 Replies
Loading