- Abstract: The variational autoencoder (VAE) is an efficient method to learn probabilistic latent variable models by the use of an inference network, which predicts a distribution over latent variables given the input. However, VAEs are known to suffer from ``posterior collapse'' when combined with flexible neural autoregressive generators such as LSTMs or PixelCNNs, where the generator tends to ignore the latent variables and the variational posterior collapses to the prior. In this paper, we investigate this problem from the perspective of training dynamics. We find that the approximated posterior distribution lags far behind the model's true posterior in the initial stages of training, which pressures the generator to ignore the latent encoding. To address this issue, we propose an extremely simple training procedure for VAE models that mitigates the lagging issue: aggressively optimizing the inference network with more updates before reverting back to basic VAE training. Despite introducing neither new components nor significant complexity over basic VAEs, our approach is able to circumvent the collapse problem that has plagued a large amount of previous work using VAE-based models. Empirically, our approach outperforms strong autoregressive baselines on text and image benchmarks in terms of density estimation, land achieves results competitive to more complicated previous methods. Our method also trains 5x faster on average than the most comparable state-of-the-art method, the semi-amortized VAE.
- Keywords: variational autoencoders, posterior collapse, generative models
- TL;DR: To address posterior collapse in VAEs, we propose a simple yet effective training algorithm that aggressively optimizes inference network with more updates