Keywords: Synthetic Labelling, Dataset Generation, Object Detection
Abstract: Human annotations are the backbone of modern computer vision, but it is becoming clear that human data is an inefficient resource. Human annotations typically capture a *single fixed-view* of the otherwise rich visual information present in data. How can we move towards computer vision datasets that are *adaptively labeled*? We propose Instance-Level Retrieval, a method that adaptively builds datasets for object detection from large collections of unlabeled images. Given a handful of examples, our method finds and labels the most relevant training data by comparing self-supervised representations for objects. Starting from an unlabeled image set derived from the Pascal VOC training, we rebuild Pascal VOC without human annotations. In experiments that control data scale, models trained on our data not only match training on the original Pascal VOC human annotations but exhibit an average improvement of $0.009$ mAP. Code for the method and examples are available at: [https://instance-rag.github.io](instance-rag.github.io)
Submission Number: 71
Loading