Track: Extended abstract
Keywords: identifiability, SSL theory, Structure-inducing learning, InfoNCE, Contrastive Learning
TL;DR: We generalize previous identifiability results for contrastive learning toward anisotropic latents that better capture the effect of augmentations used in practical applications, thereby reducing the gap between theory and practice.
Abstract: Previous theoretical work on contrastive learning (CL) with InfoNCE showed that, under certain assumptions, the learned representations uncover the ground-truth latent factors. We argue these theories overlook crucial aspects of how CL is deployed in practice. Specifically, they assume that within a positive pair, all latent factors either vary to a similar extent, or that some do not vary at all. However, in practice, positive pairs are often generated using augmentations such as strong cropping to just a few pixels. Hence, a more realistic assumption is that all latent factors change, with a continuum of variability across these factors. We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in this anisotropic setting, broadly generalizing previous identifiability results in CL. AnInfoNCE learns to embed the data into a spherical latent space through a trainable similarity metric. We validate our identifiability results in controlled experiments and show that AnInfoNCE increases the recovery of previously collapsed information in CIFAR10 and ImageNet, albeit at the cost of downstream accuracy.
Submission Number: 13
Loading