MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises

Emanuele Palumbo; Imant Daunhawer; Julia E Vogt

MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises

Emanuele Palumbo, Imant Daunhawer, Julia E Vogt

Published: 29 Mar 2022, Last Modified: 05 May 2023ICLR 2022 DGM4HSD workshop PosterReaders: Everyone

Keywords: multimodal learning, variational autoencoder, deep generative models

Abstract: Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with a large number of modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. We focus on the mixture-of-experts multimodal VAE (MMVAE), which achieves good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the MMVAE that improves its generative quality, while maintaining high semantic coherence. For this, shared and modality-specific information is modelled in separate latent subspaces. In contrast to previous approaches with separate subspaces, our model is robust to changes in latent dimensionality and regularization hyperparameters. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works.

4 Replies

Loading