MC-SSL: Towards Multi-Concept Self-Supervised Learning

Sara Atito; Muhammad Awais; Ammarah Farooq; Zhenhua Feng; Josef Kittler

MC-SSL: Towards Multi-Concept Self-Supervised Learning

Sara Atito, Muhammad Awais, Ammarah Farooq, Zhenhua Feng, Josef Kittler

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Self-supervised Learning, Group Masked Model Learning, Masked Autoencoders, Vision Transformers, Knowledge Distillation

Abstract: Self-supervised pre-training is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pre-training has shown to outperform supervised pre-training for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training images, which convey multiple concepts, but are annotated using a single dominant class label. Although Self-Supervised Learning (SSL), in principle, is free of this limitation, the choice of a pretext task facilitating SSL can perpetuate this shortcoming by driving the learning process towards a single concept output. This study aims to investigate the possibility of modelling all the concepts present in an image without using labels. In this respect the proposed Multi-Concept SSL (MC-SSL) framework is a step towards unsupervised learning which embraces all the diverse content in an image with the aim of explicitly modelling the information from all the concepts present in the image. MC-SSL involves two core design steps: group masked model learning (GMML) and learning of pseudo-concepts for data tokens using a momentum encoder (teacher-student) framework. An added benefit of MC-SSL is the ability to train data hungry transformers on small datasets with high accuracy without external data. Experimental results on multi-label and multi-class image classification downstream tasks demonstrate that MC-SSL not only surpasses existing SSL methods but also outperforms supervised transfer learning. The source code will be made publicly available for the community to train on bigger corpus.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Unsupervised and Self-supervised learning

Supplementary Material: zip

11 Replies

Loading