InfoNCE is a variational autoencoder

InfoNCE is a variational autoencoder

TMLR Paper449 Authors

20 Sept 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: There are two main approaches to self-supervised learning (SSL), generative SSL, where we learn a full probabilistic model of all the inputs, or contrastive SSL where we train on a supervised learning task that has been carefully designed to encourage good representations. We reconcile generative and contrastive SSL by showing that contrastive SSL methods (including InfoNCE) which are motivated in terms of maximizing the mutual information (MI) implicitly learn a full probabilistic model of the inputs, parameterised as a variational autoencoder (VAE). In particular, when we learn the optimal prior, the VAE objective (the ELBO) becomes equal to the MI. In turn, we show that for a deterministic encoder the ELBO is equal to the log Bayesian model evidence. This establishes a profound connection between Bayesian inference and information theory. However, practical InfoNCE methods do not use the MI as an objective: the MI is invariant to arbitrary invertible transformations, so using an MI objective can lead to highly entangled representations (Tschannen et al., 2019). Instead, the actual InfoNCE objective is a simplified lower bound on the MI which is loose even in the infinite sample limit. This raises an important question: does it really make sense to motivate an objective that works (i.e. the actual InfoNCE objective) as a loose bound on an objective that does not work (i.e. the true MI which gives arbitrarily entangled representations)? We give an alternative motivation for the actual InfoNCE objective. In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the log Bayesian model evidence. Thus, we argue that it make sense to motivate the InfoNCE from our VAE perspective (as the actual InfoNCE objective is equal to the log Bayesian model evidence), as opposed to the MI perspective (as the actual InfoNCE objective forms only a loose bound on the MI).

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Andriy_Mnih1

Submission Number: 449

Loading