The Trade-off between Label Efficiency and Universality of Representations from Contrastive Learning
Keywords: Contrastive Learning, Self-Supervised Learning, Foundation Model, Complexity
TL;DR: We focus on contrastive learning and systematically study a trade-off between label efficiency and universality both empirically and theoretically.
Abstract: The pre-train representation learning paradigm is a recent popular approach to address distribution shift and limitations in training data. This approach first pre-trains a representation function using large unlabeled datasets from multiple tasks by self-supervised (e.g., contrastive) learning, and then learns a simple classifier on the representation using small labeled datasets from the downstream target tasks. The representation should have two key properties: label efficiency (i.e., ability to learn an accurate classifier with a small amount of labeled data) and universality (i.e., usefulness across a wide range of downstream tasks). In this paper, we focus on contrastive learning and systematically study the trade-off between label efficiency and universality both theoretically and empirically. We empirically show that this trade-off exists in different models and datasets. Theoretically, we propose a data model with a hidden representation and provide analysis in a simplified linear setting. Our analysis shows that compared to pre-training on the target task, pre-training on diverse tasks leads to a larger sample complexity for learning the optimal classifier, and thus has worse prediction performance.