- Abstract: The application of multi-modal generative models by means of a Variational Auto Encoder (VAE) is an upcoming research topic for sensor fusion and bi-directional modality exchange. This contribution gives insights into the learned joint latent representation and shows that expressiveness and coherence are decisive properties for multi-modal datasets. Furthermore, we propose a multi-modal VAE derived from the full joint marginal log-likelihood that is able to learn the most meaningful representation for ambiguous observations. Since the properties of multi-modal sensor setups are essential for our approach but hardly available, we also propose a technique to generate correlated datasets from uni-modal ones.
- Keywords: Multi-Modal Deep Generative Models, Sensor Fusion, Data Generation, VAE
- TL;DR: Deriving a general formulation of a multi-modal VAE from the joint marginal log-likelihood.