Abstract: The goal of KD is to transfer valuable knowledge from a strong teacher model to a weaker student model in order to bridge the performance gap. However, the conventional teacher-student paradigm of KD can not be applied to the teacher model and student model with different output representations (referred to as heterogeneous models). To overcome this limitation, we propose a model-agnostic approach for KD called MAKD. MAKD consists of two stages: informative sample generation and knowledge transfer. The key idea is to generate informative samples that convey the discrepancy between the teacher model and student model, so that the generated samples are effective and efficient. We formulate two ways for informative sample generation: word substitution and text generation. After generating desired samples, we perform knowledge transfer by mixing the informative samples into the original training set and retraining the student model to improve its performance. We conducted experiments on two tasks, dependency parsing, and grammatical error correction, and the results demonstrate that MAKD successfully enhances the performance of the student model on both tasks. Our data and code are available at https://github.com/WinnieHAN/Model_Agnostic_Knowledge_Distillation.
Loading