Symmetric Variational Inference with High Mutual Information

Micha Livne

Published: 2020, Last Modified: 01 Oct 2024undefined 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This thesis introduces the Mutual Information Machine (MIM), an autoencoder model for learning joint distributions over observations and latent states. The model formulation reflects three key design principles: 1) low divergence, or symmetry, to encourage the encoder and decoder to learn consistent factorizations of the same underlying distribution; 2) high mutual information, or approximate invertibility, to encourage an informative relation between data and latent variables; and 3) low marginal entropy, or compression, which tends to encourage clustered latent representations. Taken together, these objectives yield a cross entropy loss for learning latent variable models. The resulting form of amortized, symmetric variational inference stands in contrast to the use of an evidence-lower-bound (ELBO) in VAEs, and the use of adversarial learning that is common with other models formulated in terms of a symmetric divergence. In this thesis we systematically probe different terms in the variational bound, providing intuition about MIM. Experiments show that MIM is capable of learning a latent representation with high mutual information, and good unsupervised clustering, while providing data log likelihoods comparable to VAE. We demonstrate state of the art results on image and language data.