Abstract: A core component of the recent success of self-supervised learning is cropping
data augmentation, which selects sub-regions of an image to be used as positive
views in the self-supervised loss. The underlying assumption is that randomly
cropped and resized regions of a given image share information about the objects of
interest, which the learned representation will capture. This assumption is mostly
satisfied in datasets such as ImageNet where there is a large, centered object, which
is highly likely to be present in random crops of the full image. However, in other
datasets such as OpenImages or COCO, which are more representative of real world
uncurated data, there are typically multiple small objects in an image. In this work,
we show that self-supervised learning based on the usual random cropping performs
poorly on such datasets. We propose replacing one or both of the random crops
with crops obtained from an object proposal algorithm. This encourages the model
to learn both object and scene level semantic representations. Using this approach,
which we call object-aware cropping, results in significant improvements over
scene cropping on classification and object detection benchmarks. For example, on
OpenImages, our approach achieves an improvement of 8.8% mAP over random
scene-level cropping using MoCo-v2 based pre-training. We also show significant
improvements on COCO and PASCAL-VOC object detection and segmentation
tasks over the state-of-the-art self-supervised learning approaches. Our approach
is efficient, simple and general, and can be used in most existing contrastive and
non-contrastive self-supervised learning frameworks.
0 Replies
Loading