Efficient Unsupervised Knowledge Distillation with Space Similarity

Aditya Singh; Haohan Wang

Efficient Unsupervised Knowledge Distillation with Space Similarity

Aditya Singh, Haohan Wang

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: unsupervised knowledge distillation

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Efficient knowledge distillation in an unsupervised setting

Abstract: In this paper, we aim to boost performance of knowledge distillation without the ground-truth labels. Hence, a student can only rely on the response generated by its teacher. Many existing approaches under this setting rely on some form of feature/embedding queue to capture neighbourhood information. These queues can be as large as over 100k samples. Also, some of these methods rely on multitude of operations which as a result increases the memory requirement for training many folds. In this work, we show that merely working with the input batch (often of size $256$) it is possible to not only incorporate neighbourhood information but also obtain state of the art unsupervised distillation performance. We achieve this by introducing a simple space similarity loss component which works alongside the well known normalized cosine similarity computed on the final features. In this loss, we motivate each dimension of a student's feature space to be similar to the corresponding dimension of its teacher. With this seemingly simple addition, we are able to compete against many contemporary methods which either rely on large number of queued features or heavy pre-processing. We perform extensive experiments comparing our proposed approach to other state of the art methods on various computer vision tasks for established architectures.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9077

Loading