Keywords: GANs, data augmentation, BigGAN
TL;DR: BigGANs do not capture the ImageNet data distributions and are only modestly successful for data augmentation.
Abstract: Recent advances in Generative Adversarial Networks (GANs) – in architectural design, training strategies, and empirical tricks – have led nearly photorealistic samples on large-scale datasets such as ImageNet. In fact, for one model in particular, BigGAN, metrics such as Inception Score or Frechet Inception Distance nearly match those of the dataset, suggesting that these models are close to match-ing the distribution of the training set. Given the quality of these models, it is worth understanding to what extent these samples can be used for data augmentation, a task expressed as a long-term goal of the GAN research project. To that end, we train ResNet-50 classifiers using either purely BigGAN images or mixtures of ImageNet and BigGAN images, and test on the ImageNet validation set.Our preliminary results suggest both a measured view of state-of-the-art GAN quality and highlight limitations of current metrics. Using only BigGAN images, we find that Top-1 and Top-5 error increased by 120% and 384%, respectively, and furthermore, adding more BigGAN data to the ImageNet training set at best only marginally improves classifier performance. Finally, we find that neither Inception Score, nor FID, nor combinations thereof are predictive of classification accuracy. These results suggest that as GANs are beginning to be deployed in downstream tasks, we should create metrics that better measure downstream task performance. We propose classification performance as one such metric that, in addition to assessing per-class sample quality, is more suited to such downstream tasks.