Weakly-Supervised Contrastive Learning for Imprecise Class Labels

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Contrastive learning has achieved remarkable success in learning effective representations, with supervised contrastive learning often outperforming self-supervised approaches. However, in real-world scenarios, data annotations are often ambiguous or inaccurate, meaning that class labels may not reliably indicate whether two examples belong to the same class. This limitation restricts the applicability of supervised contrastive learning. To address this challenge, we introduce the concept of ``continuous semantic similarity'' to define positive and negative pairs. Instead of directly relying on imprecise class labels, we measure the semantic similarity between example pairs, which quantifies how closely they belong to the same category by iteratively refining weak supervisory signals. Based on this concept, we propose a graph-theoretic framework for weakly-supervised contrastive learning, where semantic similarity serves as the graph weights. Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios. We demonstrate its effectiveness through experiments in two common settings, i.e., noisy label and partial label learning, where existing methods can be easily integrated to significantly improve performance. Theoretically, we establish an error bound for our approach, showing that it can approximate supervised contrastive learning under mild conditions. The implementation code is available at [https://github.com/Speechless-10308/WSC](https://github.com/Speechless-10308/WSC).
Lay Summary: Real-world data often comes with imperfect labels—like vague tags or mistakes—that confuse AI models. Traditional methods, which rely on precise labels to learn patterns, struggle because they can’t tell which items truly belong together. We created a new AI training method that uses "soft" similarity scores instead of rigid labels. Imagine connecting dots (data points) with lines whose thickness shows how alike they are. Our system adjusts these connections over time, even with messy labels, to better capture true relationships. This approach boosts accuracy in tough scenarios, like when labels are 90% noisy, and works across tasks like image recognition. It’s flexible, easy to add to existing tools, and backed by math to ensure reliability. This makes AI more practical for real-world data, where perfect labels are rare but learning still needs to happen.
Link To Code: https://github.com/Speechless-10308/WSC
Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning
Keywords: Weakly-supervised learning, Contrastive learning, Noisy label learning, Partial label learning
Submission Number: 2740
Loading