- Keywords: conditional variational autoencoder, posterior collapse, generative models
- TL;DR: We propose a conditional variational autoencoder framework that mitigates the posterior collapse in scenarios where the conditioning signal strong enough for an expressive decoder to generate a plausible output from it.
- Abstract: Training conditional generative latent-variable models is challenging in scenarios where the conditioning signal is very strong and the decoder is expressive enough to generate a plausible output given only the condition; the generative model tends to ignore the latent variable, suffering from posterior collapse. We find, and empirically show, that one of the major reasons behind posterior collapse is rooted in the way that generative models are conditioned, i.e., through concatenation of the latent variable and the condition. To mitigate this problem, we propose to explicitly make the latent variables depend on the condition by unifying the conditioning and latent variable sampling, thus coupling them so as to prevent the model from discarding the root of variations. To achieve this, we develop a conditional Variational Autoencoder architecture that learns a distribution not only of the latent variables, but also of the condition, the latter acting as prior on the former. Our experiments on the challenging tasks of conditional human motion prediction and image captioning demonstrate the effectiveness of our approach at avoiding posterior collapse. Video results of our approach are anonymously provided in http://bit.ly/iclr2020