How Does Contrastive Pre-training Connect Disparate Domains?

Kendrick Shen; Robbie Matthew Jones; Ananya Kumar; Sang Michael Xie; Percy Liang

How Does Contrastive Pre-training Connect Disparate Domains?

Kendrick Shen, Robbie Matthew Jones, Ananya Kumar, Sang Michael Xie, Percy Liang

Published: 02 Dec 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop DistShift PosterReaders: Everyone

Keywords: pre-training, contrastive learning, robustness, out-of-distribution, domain shift

TL;DR: Off-the-shelf contrastive pre-training is a competitive method for domain adaptation, and we develop a connectivity framework to understand how it learns representations that generalize across domains.

Abstract: Pre-training on massive unlabeled datasets greatly improves accuracy under distribution shifts. As a first step toward understanding this, we study a popular pre-training method, contrastive learning, in the unsupervised domain adaptation (UDA) setting where we only have labeled data from a source domain and unlabeled data from a target domain. We begin by showing on 4 benchmark datasets that out-of-the-box contrastive pre-training (even without large-scale unlabeled data) is competitive with other UDA methods. Intuitions from classical UDA methods such as domain adversarial training focus on bringing the domains together in feature space to improve generalization from source to target. Surprisingly, we find that contrastive pre-training learns features that are very far apart between the source and target domains. How then does contrastive learning improve robustness to distribution shift? We develop a conceptual model for contrastive learning under domain shifts, where data augmentations form connections between classes and domains that can be far apart. We propose a new measure of connectivity ---the relative connection strengths between same and different classes across domains---that governs the success of contrastive pre-training for domain adaptation in a simple example and strongly correlates with our results on benchmark datasets.

1 Reply

Loading