Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Manzil Zaheer; Ankit Singh Rawat; Seungyeon Kim; Chong You; Himanshu Jain; Andreas Veit; Rob Fergus; Sanjiv Kumar

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

Published: 01 Feb 2023, Last Modified: 26 May 2025ICLR 2023 posterReaders: Everyone

Keywords: Distillation, Semisupervised learning, Efficient machine learning, Generalization bounds, knowledge distillation

TL;DR: We propose and theoretically analyze a novel way to improve the training efficiency of compact student models that better leverages the knowledge of pretrained generative (teacher) models compared to standard distillation methods.

Abstract: The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain, which typically corresponds to a much lower dimensional manifold than the input space. Furthermore, we can use the teacher to explore input space more efficiently through sampling or gradient-based methods; thus, making TGT especially attractive for limited data or long-tail settings. We formally capture this benefit of proposed data-domain exploration in our generalization bounds. We find that TGT can improve accuracy on several image classification benchmarks as well as a range of text classification and retrieval tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/teacher-guided-training-an-efficient/code)

9 Replies

Loading