- Keywords: Contrastive learning, Self-supervised learning, Unsupervised learning, Stronger augmentations
- Abstract: Representation learning has been greatly improved with the advance of contrastive learning methods with the performance being closer to their supervised learning counterparts. Those methods have greatly benefited from various data augmentations that are carefully designated to maintain their identities so that the images transformed from the same instance can still be retrieved. Although stronger augmentations could expose novel patterns of representations to improve their generalizability, directly using stronger augmentations in instance discrimination-based contrastive learning may even deteriorate the performance, because the distortions induced from the stronger augmentations could ridiculously change the image structures and thus the transformed images cannot be viewed as the same as the original ones any more. Additional efforts are needed for us to explore the role of the stronger augmentations in further pushing the performance of unsupervised learning to the fully supervised upper bound. Instead of applying the stronger augmentations directly to minimize the contrastive loss, we propose to minimize the distribution divergence between the weakly and strongly augmented images over the representation bank to supervise the retrieval of strongly augmented queries from a pool of candidates. This avoids an overoptimistic assumption that could overfit the strongly augmented queries containing distorted visual structures into the positive targets in the representation bank, while still being able to distinguish them from the negative samples by leveraging the distributions of weakly augmented counterparts. The proposed method achieves top-1 accuracy of 76.2% on ImageNet with a standard ResNet-50 architecture with a single-layer classifier fine-tuned. This is almost the same as 76.5% of top-1 accuracy with a fully supervised ResNet-50. Moreover, it outperforms the previous self-supervised and supervised methods on both the transfer learning and object detection tasks.
- One-sentence Summary: This paper presents a novel contrastive learning model that enables the use of stronger augmentations via distributional divergence minimization to achieve a new record of accuracy with vanishing performance gap to the fully supervised network.
- Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics