Multi-Domain Self-Supervised Learning

Neha Mukund Kalibhat; Yogesh Balaji; C. Bayan Bruss; Soheil Feizi

Multi-Domain Self-Supervised Learning

Neha Mukund Kalibhat, Yogesh Balaji, C. Bayan Bruss, Soheil Feizi

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: self-supervised learning, contrastive learning, multi-domain data, unsupervised learning

Abstract: Contrastive self-supervised learning has recently gained significant attention owing to its ability to learn improved feature representations without the use of label information. Current contrastive learning approaches, however, are only effective when trained on a particular dataset, limiting their utility in diverse multi-domain settings. In fact, training these methods on a combination of several domains often degrades the quality of learned representations compared to the models trained on a single domain. In this paper, we propose a Multi-Domain Self-Supervised Learning (MDSSL) approach that can effectively perform representation learning on multiple, diverse datasets. In MDSSL, we propose a three-level hierarchical loss for measuring the agreement between augmented views of a given sample, agreement between samples within a dataset and agreement between samples across datasets. We show that MDSSL when trained on a mixture of CIFAR-10, STL-10, SVHN and CIFAR-100 produces powerful representations, achieving up to a $25\%$ increase in top-1 accuracy on a linear classifier compared to single-domain self-supervised encoders. Moreover, MDSSL encoders can generalize more effectively to unseen datasets compared to both single-domain and multi-domain baselines. MDSSL is also highly efficient in terms of the resource usage as it stores and trains a single model for multiple datasets leading up to $17\%$ reduction in training time. Finally, for multi-domain datasets where domain labels are unknown, we propose a modified approach that alternates between clustering and MDSSL. Thus, for diverse multi-domain datasets (even without domain labels), MDSSL provides an efficient and generalizable self-supervised encoder without sacrificing the quality of representations in individual domains.

One-sentence Summary: We present a novel approach to learn self-supervised representations on multi-domain data in an efficient and generalizable manner.

Supplementary Material: zip

15 Replies

Loading