Abstract: Instance discrimination learns visual representations by treating different augmented views
of the same image as positive pairs. While this encourages invariance to handcrafted transformations,
same-image positives can preserve nuisance correlations such as background,
texture, illumination, and object-specific details. Semantic positive pairs, i.e., different
same-class instances, may reduce these correlations by presenting objects across diverse
contexts. However, previous studies often combine semantic pairs with augmented positives
or false neighbors (i.e., incorrectly mapped semantic pairs), making it difficult to isolate the
effect of semantic pairing. We present a controlled empirical study of semantic positive pairs
for self-supervised representation learning. From ImageNet-1K, we construct two matched
subsets: an augmented-pair baseline and a manually curated semantic-pair dataset with the
same class composition and training-pair count. We use these datasets to compare representative
contrastive and non-contrastive SSL methods under matched training conditions.
Across transfer learning and object detection evaluations, semantic-pair pretraining consistently
improves generalisation over augmented-pair pretraining. Additional ablations show
that semantic pairs induce invariances beyond the standard transformation pipeline. Among
the evaluated methods, contrastive learning benefits most strongly from semantic pairs, with
SimCLR showing the largest relative improvement. These results clarify the role of semantic
positive pairs in SSL and provide guidance for selecting and designing frameworks that can
exploit semantic pair information effectively.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Modify Figure3
Assigned Action Editor: ~Bryan_Allen_Plummer1
Submission Number: 9275
Loading