Knowledge Distillation By Sparse Representation Matching

Dat Thanh Tran; Moncef Gabbouj; Alexandros Iosifidis

Knowledge Distillation By Sparse Representation Matching

Dat Thanh Tran, Moncef Gabbouj, Alexandros Iosifidis

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Knowledge Distillation, Sparse Representation, Transfer Learning

Abstract: Knowledge Distillation refers to a class of methods that transfers the knowledge from a teacher network to a student network. In this paper, we propose Sparse Representation Matching (SRM), a method to transfer intermediate knowledge obtained from one Convolutional Neural Network (CNN) to another by utilizing sparse representation learning. SRM first extracts sparse representations of the hidden features of the teacher CNN, which are then used to generate both pixel-level and image-level labels for training intermediate feature maps of the student network. We formulate SRM as a neural processing block, which can be efficiently optimized using stochastic gradient descent and integrated into any CNN in a plug-and-play manner. Our experiments demonstrate that SRM is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.

One-sentence Summary: A knowledge distillation method that utilizes sparse representation to transfer intermediate knowledge in convolutional neural networks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/knowledge-distillation-by-sparse/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=bZKOzDui3e

20 Replies

Loading