TL;DR: We derive generalization bounds for CRL when data is limited to a fixed pool of labeled, reusable data points across tuples
Abstract: Contrastive Representation Learning (CRL) has achieved impressive success in various domains in recent years. Nevertheless, the theoretical understanding of the generalization behavior of CRL has remained limited. Moreover, to the best of our knowledge, the current literature only analyzes generalization bounds under the assumption that the data tuples used for contrastive learning are independently and identically distributed. However, in practice, we are often limited to a fixed pool of reusable labeled data points, making it inevitable to recycle data across tuples to create sufficiently large datasets. Therefore, the tuple-wise independence condition imposed by previous works is invalidated. In this paper, we provide a generalization analysis for the CRL framework under non-$i.i.d.$ settings that adheres to practice more realistically. Drawing inspiration from the literature on U-statistics, we derive generalization bounds which indicate that the required number of samples in each class scales as the logarithm of the covering number of the class of learnable feature representations associated to that class. Next, we apply our main results to derive excess risk bounds for common function classes such as linear maps and neural networks.
Lay Summary: Contrastive Representation Learning (CRL) is a powerful machine learning framework that enhances data representation by pulling similar data pairs together while pushing dissimilar pairs apart. This requires each training dataset to take the form of a collection of small groups where each group is composed of two similar objects (referred to as ‘anchors’), together a set of other objects which are known to be very different from the two anchor objects. We study CRL in the context of generalization theory, which is concerned with estimating the amount of data necessary for models to attain a desirable performance (also referred to as the ‘sample complexity’). Previous works have explored CRL settings where the groups are independent of each other. In our work, we explored the setting where the groups are formed from a finite pool of labeled examples, allowing the objects to be recycled across groups, breaking the assumption of statistical independence which is central in classic learning theory. Under some assumptions on the proportion of objects in each class, we show that the sample complexity is not worse than the fully independent settings. Experimentally, we demonstrate that models which reuse objects in different groups can outperform models which do not.
Primary Area: Theory->Learning Theory
Keywords: Contrastive Learning, Generalization Analysis
Submission Number: 15584
Loading