Illuminating Dark Knowledge via Random Matrix EnsemblesDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Abstract: It is all but certain that machine learning models based on deep neural networks will soon feature ubiquitously in a wide variety of critical products and services that people rely on. This should be a major cause of concern given that we still lack a rigorous understanding of the failure modes of these systems, and can hardly make guarantees about the conditions under which the models are expected to work. In particular, we would like to understand how these models manage to generalize so well, even when seemingly overparametrized, effectively evading many of the intuitions expected from statistical learning theory. We argue that Distillation (Caruana et al., 2006, Hinton et al., 2014) provides us with a rich playground for understanding what enables generalization in a concrete setting. We carry out a precise high-dimensional analysis of generalization under distillation in a real world setting, eschewing ad hoc assumptions, and instead consider models actually encountered in the wild.
One-sentence Summary: We show that one can analytically predict the generalization performance of a model under distillation conditioned on its baseline performance.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=k9zbNQJGr
5 Replies

Loading