Deep Generative Models for learning Coherent Latent Representations from Multi-Modal Data

Timo Korthals; Marc Hesse; Jürgen Leitner

Deep Generative Models for learning Coherent Latent Representations from Multi-Modal Data

Timo Korthals, Marc Hesse, Jürgen Leitner

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: The application of multi-modal generative models by means of a Variational Auto Encoder (VAE) is an upcoming research topic for sensor fusion and bi-directional modality exchange. This contribution gives insights into the learned joint latent representation and shows that expressiveness and coherence are decisive properties for multi-modal datasets. Furthermore, we propose a multi-modal VAE derived from the full joint marginal log-likelihood that is able to learn the most meaningful representation for ambiguous observations. Since the properties of multi-modal sensor setups are essential for our approach but hardly available, we also propose a technique to generate correlated datasets from uni-modal ones.

Keywords: Multi-Modal Deep Generative Models, Sensor Fusion, Data Generation, VAE

TL;DR: Deriving a general formulation of a multi-modal VAE from the joint marginal log-likelihood.

Data: [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist)

10 Replies

Loading