Multi-modal Variational Encoder-Decoders

Iulian V. Serban; Alexander G. Ororbia II; Joelle Pineau; Aaron Courville

Multi-modal Variational Encoder-Decoders

Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville

03 Jul 2025 (modified: 21 Jul 2022)Submitted to ICLR 2017Readers: Everyone

Abstract: Recent advances in neural variational inference have facilitated efficient training of powerful directed graphical models with continuous latent variables, such as variational autoencoders. However, these models usually assume simple, uni-modal priors — such as the multivariate Gaussian distribution — yet many real-world data distributions are highly complex and multi-modal. Examples of complex and multi-modal distributions range from topics in newswire text to conversational dialogue responses. When such latent variable models are applied to these domains, the restriction of the simple, uni-modal prior hinders the overall expressivity of the learned model as it cannot possibly capture more complex aspects of the data distribution. To overcome this critical restriction, we propose a flexible, simple prior distribution which can be learned efficiently and potentially capture an exponential number of modes of a target distribution. We develop the multi-modal variational encoder-decoder framework and investigate the effectiveness of the proposed prior in several natural language processing modeling tasks, including document modeling and dialogue modeling.

TL;DR: Learning continuous multimodal latent variables in the variational auto-encoder framework for text processing applications.

Conflicts: umontreal.ca, psu.edu, cs.mcgill.ca

Keywords: Deep learning, Structured prediction, Natural language processing

27 Replies

Loading