Supervision Complexity and its Role in Knowledge Distillation

Hrayr Harutyunyan; Ankit Singh Rawat; Aditya Krishna Menon; Seungyeon Kim; Sanjiv Kumar

Supervision Complexity and its Role in Knowledge Distillation

Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

Published: 01 Feb 2023, Last Modified: 27 Feb 2023ICLR 2023 posterReaders: Everyone

Keywords: distillation, kernel methods, neural tangent kernel

TL;DR: We provide a new theoretical perspective on knowledge distillation through the lens of supervision complexity -- a measure of alignment between the teacher-provided supervision and the student's neural tangent kernel.

Abstract: Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision complexity: a measure of alignment between teacher-provided supervision and the student's neural tangent kernel. The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions. Specifically, it provides a rigorous justification for the utility of various techniques that are prevalent in the context of distillation, such as early stopping and temperature scaling. Our analysis further suggests the use of online distillation, where a student receives increasingly more complex supervision from teachers in different stages of their training. We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

11 Replies

Loading