Keywords: contrastive learning, hard negative mining, mutual information, lower bound, detection, segmentation, MoCo
Abstract: Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two transformations of an image. NCE typically uses randomly sampled negative examples to normalize the objective, but this may often include many uninformative examples either because they are too easy or too hard to discriminate. Taking inspiration from metric learning, we show that choosing semi-hard negatives can yield stronger contrastive representations. To do this, we introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive. We prove that these estimators remain lower-bounds of mutual information, with higher bias but lower variance than NCE. Experimentally, we find our approach, applied on top of existing models (IR, CMC, and MoCo) improves accuracy by 2-5% absolute points in each case, measured by linear evaluation on four standard image benchmarks. Moreover, we find continued benefits when transferring features to a variety of new image distributions from the Meta-Dataset collection and to a variety of downstream tasks such as object detection, instance segmentation, and key-point detection.
One-sentence Summary: Theoretical and experimental evidence that choosing difficult negative examples in contrastive learning can learn stronger representations as measured by several downstream tasks and image distributions.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Data: [COCO](https://paperswithcode.com/dataset/coco), [ImageNet](https://paperswithcode.com/dataset/imagenet), [Meta-Dataset](https://paperswithcode.com/dataset/meta-dataset)