Knowledge Cascade: Reverse Knowledge Distillation

Luyang Fang; Haoran Lu; Wenxuan Zhong; Ping Ma

Knowledge Cascade: Reverse Knowledge Distillation

Luyang Fang, Haoran Lu, Wenxuan Zhong, Ping Ma

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Knowledge distillation, subsampling, large-scale data, nonparametric, reproducing kernel Hilbert space, asymptotic theory

Abstract: With the rapidly growing model complexity in the state-of-the-art machine learning methods, the expensive model training process has rendered the algorithm design and computation resources allocation challenging. To tackle the challenges, we propose the knowledge cascade (KCas), a strategy that reverses the idea of knowledge distillation (KD). While KD compresses and transfers the knowledge learned by a large-and-complex model (teacher model) to a small-and-simple model (student model), KCas inversely transfer the knowledge in a student model to a teacher model. Despite the fact that teacher models are more sophisticated and capable than student models, we show that in KCas, student models can effectively facilitate teacher models building by taking advantage of the statistical asymptotic theories. We demonstrate the outstanding performance of KCas on the nonparametric multivariate functional estimation in reproducing kernel Hilbert space. One of the crucial problems in accomplishing the estimation is the daunting computational cost of selecting smoothing parameters, whose number will increase exponentially as the number of predictors increases. KCas transfers the knowledge about the smoothing parameters of the target function learned from the student model to the teacher model based on empirical and asymptotic results. KCas significantly reduces the computational cost of the smoothing parameter selection process from $O(n^3)$ to $O(n^{3/4})$, while preserving excellent performance. Theoretical analysis of asymptotic convergence rates and extensive empirical evaluations on simulated and real data validate the effectiveness of KCas.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

4 Replies

Loading