How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.
Lay Summary: Contrastive learning has recently achieved state-of-the-art performance without label information. How does the label inconsistency (label error) caused by data augmentation affect contrastive learning? We wanted to theoretically answer this question from the perspective of data dimensionality reduction. This paper reveals some negative impacts of label error on the downstream prediction performance. Surprisingly, we find that traditional data dimensionality reduction methods like singular value decomposition (SVD) can mild these negative impacts based on theoretical analysis. Meanwhile, massive experiments under various settings validate our theory. Our theoretical analysis provides some useful suggestions related to contrastive training, e,g, adopting moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD.
Primary Area: Deep Learning->Self-Supervised Learning
Keywords: Contrastive learning, labeling error, data dimensionality reduction, data augmentation
Submission Number: 14356
Loading