Keywords: Zero-Infinity distance, Two-timescale extragradient, Implicit bias of Extragradient
TL;DR: We propose a zero-infinity GAN that avoids both vanishing gradients and Lipschitz constraints, where extragradient enables stable training by escaping strict non-optimal points and biasing toward minimum-norm solutions.
Abstract: In supervised learning, gradient descent achieves near-zero empirical risk, while favoring solutions that generalize well---a phenomenon attributed to the implicit bias of gradient methods. In stark contrast, in generative models such as generative adversarial networks (GANs), gradient methods typically fail to achieve zero empirical risk, and thus implicit bias is left both empirically elusive and theoretically unexplored. We bridge this gap by developing new perspectives on the loss landscape of GANs together with the gradient dynamics and implicit bias of extragradient. First, regarding the loss, we challenge the prevailing preference for the Wasserstein distance, and instead propose the zero-infinity distance---a metric that equals zero when two distributions match exactly and infinity otherwise---as more compatible with gradient-based minimax optimization. On the gradient dynamics side, we prove for the first time in GANs that certain stationary points are strict non-minimax points, the minimax analogue of strict saddles in minimization. This enables the two-timescale extragradient method to effectively escape such non-optimal points---similar to gradient descent escaping strict saddles---while being stable at global solutions, in contrast to other existing methods. Lastly, regarding the implicit bias, we show that extragradient favors the minimum-norm generator solution when starting from zero and training only the last layer of neural network.
Submission Number: 165
Loading