Use mean field theory to train a 200-layer vanilla GAN

Dan Li, Shuang Liu, Zhengxin Lyu, Weilai Xiang, Wei He, Fengqi Liu, Zhen Zhang

Published: 2021, Last Modified: 02 Mar 2026ICTAI 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Training a deep generative adversarial network (GAN) with hundreds or even thousands of layers is difficult. The backpropagation depth of generator is deeper than discriminator, leading it to occur vanishing/exploding gradients easily. This paper proposes a method to train deep vanilla GAN based on mean field theory. By adjusting the parameter variances and activation of the GAN, a 200-layer vanilla GAN can be trained steadily without adding any batch normalization layers or residual blocks. We demonstrate that deep GAN is very sensitive to the parameter variances $\sigma _w^2$ , $\sigma _b^2$ in the initialization scheme, and explain why hard tanh is more suitable than relu as an activation in a deep vanilla GAN. Experiments on the MNIST and Fashion-MNIST data sets validate that our method trains a deep vanilla GAN well and can produce high-quality images.
Loading