Towards Automated Distillation: A Systematic Study of Knowledge Distillation in Natural Language ProcessingDownload PDF

25 Feb 2022 (modified: 05 May 2023)AutoML 2022 (Late-Breaking Workshop)Readers: Everyone
Abstract: Key factors underpinning the optimal Knowledge Distillation (KD) performance remain elusive as the effects of these factors are often confounded in sophisticated distillation algorithms. This poses a challenge for choosing the best distillation algorithm from the large design space for existing and new tasks alike and hinders automated distillation. In this work, we aim to identify how the distillation performance across different tasks is affected by the components in the KD pipeline, such as the data augmentation policy, the loss function, and the intermediate knowledge transfer between the teacher and the student. To isolate their effects, we propose Distiller, a meta-KD framework that systematically combines the key distillation techniques as components across different stages of the KD pipeline. Distiller enables us to quantify each component's contribution and conduct experimental studies to derive insights about distillation performance: 1) the approach used to distill the intermediate representations is the most important factor in KD performance, 2) the best-performed distillation algorithms are quite different across various tasks, and 3) data augmentation provides a large boost for small training datasets or small student networks. Based on these insights, we propose a simple AutoDistiller algorithm that can recommend a close-to-optimal KD pipeline for a new dataset/task. This is the first step toward automated KD that can save engineering costs and democratize practical KD applications.
Keywords: Meta Learning, Knowledge Distillation, AutoML
One-sentence Summary: A meta learning of knowledge distillation in natural language processing and an experimental study towards automated knowledge distillation framework.
Track: Main track
Reproducibility Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Reviewers: Haoyu He, he.haoy@northeastern.edu
CPU Hours: 0
GPU Hours: 0
TPU Hours: 0
Datasets And Benchmarks: GLUE, SQuAD
Performance Metrics: MSE, CE, Accuracy
Main Paper And Supplementary Material: pdf
0 Replies

Loading