Efficient Self-Supervised Contrastive Learning with Representation Mixing

Published: 2024, Last Modified: 27 Jan 2026BDA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Self-Supervised Learning (SSL) has emerged as a powerful method for pre-training image and language models for downstream tasks. SSL does not require labeled data for learning large models, whose features can then be adapted with supervised learning for required tasks, which makes it particularly amenable for cases with a paucity of labeled data. SSL in the context of images, typically relies on defining a pretext task, such as image inpainting, image shuffling, image contrastive learning, etc. A big challenge for SSL, however, is the computational requirements. Training with SSL takes much longer than with Supervised Learning and hence requires more computation power than Supervised Learning. This is an area of concern with the energy consumption for deep learning rising exponentially over the last few years. In this work, we develop a novel Representation Mixing method for Self-Supervised Contrastive Representation Learning that trains much faster than standard Contrastive Self-Supervised Learning methods. We utilize a feature transformation strategy by combining augmented and normalized views at the end of the network to form feature representations that are further apart in the learned embedding space. We conduct experiments on various benchmark datasets to validate our claims and show equivalent downstream accuracy while reducing pre-training. We also release our code for other researchers (https://github.com/Yash-Kumar-Sharma/Efficient-Self-Supervised-Contrastive-Learning-with-Representation-Mixing).
Loading