Adaptively Labeling Vision Datasets Via Instance-Level Retrieval

Adaptively Labeling Vision Datasets Via Instance-Level Retrieval

ICLR 2026 Conference Submission14399 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Computer Vision, Deep Learning, Self-Supervised

Abstract: Human annotations are the backbone of modern computer vision, but they are increasingly recognized as an inefficient resource. They typically capture only a single, fixed view of the rich visual information present in images. How can we move toward datasets that are labeled adaptively, rather than exhaustively by hand? We propose Instance-Level Retrieval, a method for adaptively building object detection datasets from large collections of unlabeled images. Given just a handful of seed examples, our method automatically finds and labels relevant training data by comparing self-supervised object representations. Starting from a small, subset of Pascal VOC (Visual Object Classes), we demonstrate that it is possible to retrieve a high quality set of images. In experiments that control data scale, models trained on our adaptively-labeled data exceed the performance of training on the original Pascal VOC human annotations with a $0.08$ mAP improvement. We use our retrieval method on out of distribution unlabeled images derived from ImageNet-1K, showing that our method can successfully find high quality exemplars for fixed image classes matching the Pascal VOC training set. Training on this expanded training set leads to an additional $0.105$ mAP improvement over baseline. Finally, we show that our methodology is also useful for filtering and selecting high-quality subsets of human-annotated data, yielding a $0.037$ mAP gain compared to uniformly sampled subsets.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 14399

Loading