Abstract: Recently, self-supervised pretraining methods have achieved impressive results, matching ImageNet weights on a variety of downstream tasks including object detection. Despite their success, these methods have some limitations. Most of them are optimized for image classification and compute only a global feature vector describing an entire image. On top of that, they rely on large batch sizes, a huge amount of unlabeled data and vast computing resources to work well.
Loading