- Keywords: VAE, generative modeling, deep learning, likelihood-based models
- Abstract: We present a hierarchical VAE that, for the first time, outperforms the PixelCNN in log-likelihood on all natural image benchmarks. Our work is motivated by the observation that VAEs can actually implement autoregressive models, and other, more efficient generative models, if made sufficiently deep. Despite this, autoregressive models have traditionally outperformed VAEs. To test if depth explains why, we develop an architecture with more stochastic layers than previous work and train it on CIFAR-10, ImageNet, and FFHQ. We find that, in comparison to the PixelCNN, very deep VAEs achieve higher likelihoods, use fewer parameters, generate samples thousands of times faster, and are more easily applied to high-resolution images. We attribute this to the VAEs learning efficient hierarchical representations, which we verify with visualizations of the generative process.
- One-sentence Summary: We argue deeper VAEs should perform better, implement one, and show it outperforms all PixelCNN-based autoregressive models in likelihood, while being substantially more efficient.
- Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
- Supplementary Material: zip