- Abstract: There have been multiple attempts with variational auto-encoders (VAE) to learn powerful global representations of complex data using a combination of latent stochastic variables and an autoregressive model over the dimensions of the data. However, for the most challenging natural image tasks the purely autoregressive model with stochastic variables still outperform the combined stochastic autoregressive models. In this paper, we present simple additions to the VAE framework that generalize to natural images by embedding spatial information in the stochastic layers. We significantly improve the state-of-the-art results on MNIST, OMNIGLOT, CIFAR10 and ImageNet when the feature map parameterization of the stochastic variables are combined with the autoregressive PixelCNN approach. Interestingly, we also observe close to state-of-the-art results without the autoregressive part. This opens the possibility for high quality image generation with only one forward-pass.
- TL;DR: We present a generative model that proves state-of-the-art results on gray-scale and natural images.
- Keywords: deep learning, representation learning, variational auto-encoders, variational inference, generative models