TL;DR: We propose an optimization-based algorithm to identify global false negatives by automatically learning a dataset-wide threshold on the fly for each anchor data
Abstract: In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce *GloFND*, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to *identify* its false negatives during training. In contrast to previous methods for false negative discovery, our approach *globally* detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND.
Lay Summary: Modern AI systems can learn to understand images even without being told what each image shows—a technique known as self-supervised learning. A common method is contrastive learning, where the AI compares images, bringing similar ones closer and pushing different ones apart. This creates a vector representation for each image, useful for tasks like object classification and image–text search. But without labels, the model assumes all images are different, leading to false negatives—related images, like two dog breeds, mistakenly treated as unrelated. These errors weaken learning, and checking all possible image pairs is impractical for large datasets with millions of images.
We present GLOFND, a method that automatically detects false negatives during training. Unlike prior approaches that depend on limited samples or intensive computation, GLOFND searches across the full dataset by learning a custom similarity threshold for each image. It does so while keeping computation costs low and independent of dataset size.
GLOFND improves the quality of vector representations and yields more accurate results on tasks like classification and image–text retrieval. It is easy to apply, handles large datasets effectively, and integrates seamlessly into existing contrastive learning frameworks, boosting their performance.
Link To Code: https://github.com/vibalcam/GloFND
Primary Area: General Machine Learning->Representation Learning
Keywords: False Negative Discovery, Contrastive Learning, Self-supervised Learning, Machine learning
Submission Number: 5301
Loading