SimVAE: Narrowing the gap between Discriminative & Generative Self-Supervised Representation Learning

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Self-supervised representation learning; Variational Inference; Probabilistic Generative Modelling; Contrastive Learning;
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Motivated by a theoretical analysis of existing self-supervised representation learning methods, we propose a unifying graphical model which improves the performance VAE-based methods on downstream task.
Abstract: Self-supervised representation learning is a powerful paradigm that leverages the relationship between semantically similar data, such as augmentations, extracts of an image or sound clip, or multiple views/modalities. Recent methods, e.g. SimCLR, CLIP and DINO, have made significant strides, yielding representations that achieve state-of-the-art results on multiple downstream tasks. A number of self-supervised discriminative approaches have been proposed, e.g. instance discrimination, latent clustering and contrastive methods; though often intuitive, a comprehensive theoretical understanding of their underlying mechanisms or what they learn eludes. Meanwhile, generative approaches, such as variational autoencoders (VAEs), fit a specific latent variable model and have principled appeal, but lag significantly in terms of performance. We present a theoretical analysis of self-supervised discriminative methods and a graphical model that reflects the assumptions they implicitly make, providing a unifying theoretical framework for these methods. We show that fitting this model to the data improves representations over previous VAE-based methods on several common benchmarks (MNIST, FashionMNIST, CIFAR10, Celeb-A), narrowing the gap to discriminative methods. We illustrate how generatively learned representations offer the promise of preserving more information than discriminative approaches.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1767
Loading