Aligning Pretraining for Detection via Object-Level Contrastive Learning

Fangyun Wei; Yue Gao; Zhirong Wu; Han Hu; Stephen Lin

Aligning Pretraining for Detection via Object-Level Contrastive Learning

Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 SpotlightReaders: Everyone

Keywords: self-supervsied learning, object detection, pretraining

Abstract: Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. Such generality for transfer learning, however, sacrifices specificity if we are interested in a certain downstream task. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. In this paper, we follow this principle with a pretraining method specifically designed for the task of object detection. We attain alignment in the following three aspects: 1) object-level representations are introduced via selective search bounding boxes as object proposals; 2) the pretraining network architecture incorporates the same dedicated modules used in the detection pipeline (e.g. FPN); 3) the pretraining is equipped with object detection properties such as object-level translation invariance and scale invariance. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection using a Mask R-CNN framework. Code is available at https://github.com/hologerry/SoCo.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: We introduce a self-supervised pretraining framework for object detection.

Supplementary Material: pdf

Code: https://github.com/hologerry/SoCo

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/aligning-pretraining-for-detection-via-object/code)

12 Replies

Loading