Harnessing the Power of Prompt Experts: Efficient Knowledge Distillation for Enhanced Language Understanding

Xv Meng; Jun Rao; Shuhan Qi; Lei Wang; Jing Xiao; Xuan Wang

Harnessing the Power of Prompt Experts: Efficient Knowledge Distillation for Enhanced Language Understanding

Xv Meng, Jun Rao, Shuhan Qi, Lei Wang, Jing Xiao, Xuan Wang

Published: 01 Jan 2024, Last Modified: 20 Feb 2025ECML/PKDD (8) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Enhanced with machine learning, language understanding enables computers to not only comprehend but also learn from human language, thereby augmenting the capabilities of various NLP applications in AI. Multi-teacher distillation is a prominent method for knowledge transfer in language understanding, leveraging multiple teacher models to train a single student model. However, this approach incurs significant time and storage costs for training and inference with multiple teachers. To address these issues, we introduce PEE-KD, a simple yet effective framework that generates supervision for training a student model from a single language model. We implemented a language model with multiple prompts as the teacher model in multi-teacher distillation, achieving lightweight training and inference. Additionally, we propose an uncertainty-based method to enhance the robustness and accuracy of multiple prompts during training, along with a selector module to improve the inference speed of multi-teacher models. Experiments on NLU and NER tasks demonstrate that PEE-KD improves accuracy by up to 1.8% and efficiency by up to 140% compared to existing methods. Logit visualization comparisons between teacher and student models further validate the effectiveness of our approach. Our code and data are available at https://anonymous.4open.science/r/PEEKD-DF50/.

Loading