Event Certifications: lifelong-ml.cc/CoLLAs/2023/Journal_Track
Abstract: A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised
loss. The underlying assumption is that randomly cropped and resized regions of a given
image share information about the objects of interest, which is captured by the learned
representation. This assumption is mostly satisfied in datasets such as ImageNet where
there is a large, centered object, which is highly likely to be present in random crops of
the full image. However, in other datasets such as OpenImages or COCO, which are more
representative of real world uncurated data, there are typically multiple small objects in
an image. In this work, we show that self-supervised learning based on the usual random
cropping performs poorly on such datasets (measured by the difference from fully-supervised
learning). Instead of using pairs of random crops, we propose to leverage an unsupervised
object proposal technique; the first view is a crop obtained from this algorithm, and the
second view is a dilated version of the first view. This encourages the self-supervised model
to learn both object and scene level semantic representations. Using this approach, which we
call object-aware cropping, results in significant improvements over random scene cropping on
classification and object detection benchmarks. For example, for pre-training on OpenImages,
our approach achieves an improvement of 8.8% mAP over random scene cropping (both meth-
ods using MoCo-v2). We also show significant improvements on COCO and PASCAL-VOC
object detection and segmentation tasks over the state-of-the-art self-supervised learning
approaches. Our approach is efficient, simple and general, and can be used in most existing
contrastive and non-contrastive self-supervised learning frameworks.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We addressed reviewer comments which were primarily about adding more ablation studies and a few other clarification questions.
Code: https://github.com/shlokk/object-cropping-ssl
Assigned Action Editor: ~Yale_Song1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 432
Loading