Data Curation for Large Scale Detection Pretraining

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Detection, Scaling, Robustness, Datasets
TL;DR: We scale datasets for detection pseudo-labeling
Abstract: Large multimodal datasets gathered from the internet have been a key driver of progress in recent image-text models such as DALL-E, CLIP, and Flamingo. However, structured prediction tasks have not seen the same benefits, as noisy fine-grained annotations do not exist at web-scale. In this paper, we aim to extend the gains enabled by web-sourced training sets to the problem of object detection. First, we show that data curation for grounding and localization necessitates its own approach: filtering methods which produce good datasets for image classification with CLIP models (e.g., the image-text similarity filtering from LAION-5B) do not yield better object detectors. Instead, we introduce new detection-focused filtering methods that match or outperform existing object detectors pretrained on supervised detection pretraining datasets. When trained on 102.4M images from the 12.8B image DataComp pool in a weakly supervised manner, our new filtering method matches the performance of a detector pre-trained on Object365, the largest fully-annotated detection dataset. In addition, our filtering approach shows good scaling with training set size and can be combined with Object365 to yield further improvements. To aid further research in this area, we release a 2.8B image subset of DataComp-12.8B pseudo-labeled with region proposals and detection bounding boxes.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4555
Loading