Keywords: variational autoencoder, multimodal data, product-of-experts, semi-supervised learning
Abstract: Multimodal generative models should be able to learn a meaningful latent representation that enables a coherent joint generation of all modalities (e.g., images and text). Many applications also require the ability to accurately sample modalities conditioned on observations of a subset of the modalities. Often not all modalities may be observed for all training data points, so semi-supervised learning should be possible.
In this study, we evaluate a family of product-of-experts (PoE) based variational autoencoders that have these desired properties. We include a novel PoE based architecture and training procedure. An empirical evaluation shows that the PoE based models can outperform an additive mixture-of-experts (MoE) approach.
Our experiments support the intuition that PoE models are more suited for a conjunctive combination of modalities while MoEs are more suited for a disjunctive fusion.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: Product-of-experts based variational autoencoders work well for generative modelling of multiple high-dimensional modalities
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=tB_D5RDd_r
9 Replies
Loading