Abstract: The application of multi-modal generative models by means of a Variational Auto Encoder (VAE) is an upcoming research topic for sensor fusion and bi-directional modality exchange.
This contribution gives insights into the learned joint latent representation and shows that expressiveness and coherence are decisive properties for multi-modal datasets.
Furthermore, we propose a multi-modal VAE derived from the full joint marginal log-likelihood that is able to learn the most meaningful representation for ambiguous observations.
Since the properties of multi-modal sensor setups are essential for our approach but hardly available, we also propose a technique to generate correlated datasets from uni-modal ones.
Keywords: Multi-Modal Deep Generative Models, Sensor Fusion, Data Generation, VAE
TL;DR: Deriving a general formulation of a multi-modal VAE from the joint marginal log-likelihood.
Data: [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist)
10 Replies
Loading