Abstract: This thesis introduces the Mutual Information Machine (MIM), an autoencoder model for learning joint distributions over observations and latent states. The model formulation reflects three key design principles: 1) low divergence, or symmetry, to encourage the encoder and decoder to learn consistent factorizations of the same underlying distribution; 2) high mutual information, or approximate invertibility, to encourage an informative relation between data and latent variables; and 3) low marginal entropy, or compression, which tends to encourage clustered latent representations. Taken together, these objectives yield a cross entropy loss for learning latent variable models. The resulting form of amortized, symmetric variational inference stands in contrast to the use of an evidence-lower-bound (ELBO) in VAEs, and the use of adversarial learning that is common with other models formulated in terms of a symmetric divergence. In this thesis we systematically probe different terms in the variational bound, providing intuition about MIM. Experiments show that MIM is capable of learning a latent representation with high mutual information, and good unsupervised clustering, while providing data log likelihoods comparable to VAE. We demonstrate state of the art results on image and language data.
Loading