Sparse Unbalanced GAN Training with In-Time Over-ParameterizationDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: sparse unbalance GAN training, GAN training, dynamic sparse training, sparse training, bigGAN
Abstract: Generative adversarial networks (GANs) have received an upsurging interest since being proposed due to the high quality of the generated data. While GANs achieving increasingly impressive results, the resource demands associated with the large model size hinders its usage in resource-limited scenarios. For inference, the existing model compression techniques can reduce the model complexity with comparable performance. However, the training efficiency of GANs has less be explored due to the fragile training process of GANs. In this paper, we for the first time explore the possibility of directly training sparse GAN from scratch without involving any dense or pre-training steps. Even more unconventionally, our proposed method enables training sparse unbalanced GANs with an extremely sparse generator in an end-to-end way, chasing high training and inference efficiency gains. Instead of training full GANs, we start by training a sparse subnetwork and periodically explore the sparse connectivity during training, while maintaining a fixed parameter count. Extensive experiments with modern GAN architectures validate the efficiency of our method. Our sparsified GANs, trained from scratch in one single run, outperform the ones learned by expensive iterative pruning and retraining. Perhaps most importantly, we find instead of inheriting parameters from expensive pre-trained GANs, directly training sparse GANs from scratch can be a much more efficient solution. For example, only training with a 80% sparse generator and a 50% sparse discriminator, our method can achieve even better performance than the dense BigGAN.
One-sentence Summary: We aim to be the first pilot study on training a sparse GAN with the unbalanced sparsity between generators and discriminators, without involving any dense or pre-training steps.
5 Replies

Loading