Contrastive Multiview Coding

Yonglong Tian; Dilip Krishnan; Phillip Isola

Contrastive Multiview Coding

Yonglong Tian, Dilip Krishnan, Phillip Isola

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Representation Learning, Unsupervised Learning, Self-supervsied Learning, Multiview Learning

TL;DR: An unsupervised/self-supervised framework for learning representations from multiple views

Abstract: Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We hypothesize that a powerful representation is one that models view-invariant factors. Based on this hypothesis, we investigate a contrastive coding scheme, in which a representation is learned that aims to maximize mutual information between different views but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. The resulting learned representations perform above the state of the art for downstream tasks such as object classification, compared to formulations based on predictive learning or single view reconstruction, and improve as more views are added. On the Imagenet linear readoff benchmark, we achieve 68.4% top-1 accuracy.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 9 code implementations](https://www.catalyzex.com/paper/contrastive-multiview-coding/code)

Original Pdf: pdf

7 Replies

Loading