Uniform Noise Distribution and Compact Clusters: Unveiling the Success of Self-Supervised Learning in Label Noise

Published: 20 May 2025, Last Modified: 20 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Label noise is ubiquitous in real-world datasets, posing significant challenges to machine learning models. While self-supervised learning (SSL) algorithms have empirically demonstrated effectiveness in learning noisy labels, the theoretical understanding of their effectiveness remains underexplored. In this paper, we present a theoretical framework to understand how SSL methods enhance learning with noisy labels, especially for the instance-dependent label noise. We reveal that the uniform and compact cluster structures induced by contrastive SSL play a crucial role in mitigating the adverse effects of label noise. Specifically, we theoretically show that a classifier trained on SSL-learned representations significantly outperforms one trained using traditional supervised learning methods. This results from two key merits of SSL representations over label noise: 1. Uniform Noise Distribution: Label noise becomes uniformly distributed over SSL representations with respect to the true class labels, rather than the noisy ones, leading to an easier learning task. 2. Enhanced Cluster Structure: SSL enhances the formation of well-separated and compact categorical clusters, increasing inter-class distances while tightening intra-class clusters. We further theoretically justify the benefits of training a classifier on such structured representations, demonstrating that it encourages the classifier trained on noisy data to be aligned with the optimal classifier. Extensive experiments validate the robustness of SSL representations in combating label noise, confirming the practical values of our theoretical findings.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Following the suggestions and comments from the reviewers, editor, and the TMLR guideline, we revised the whole paper to be a camera-ready version.
Supplementary Material: pdf
Assigned Action Editor: ~Gang_Niu1
Submission Number: 3955
Loading