Learnable Relational Knowledge Distillation For Language Model Compression

Feng Hu, Kai Zhang, Ye Liu, Meikai Bao, Xukai Liu, Yanjiang Chen, Gang Zhou, Qi Liu

Published: 2025, Last Modified: 19 Mar 2026DASFAA (6) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Knowledge distillation (KD) has emerged as an effective model compression method, drawing significant attention. However, existing methods that employ intermediate representations encounter three primary limitations. Firstly, these methods require the manual design of relational knowledge between intermediate representations. Secondly, their application scenarios are restricted, where the teacher and student models need to have the identical representation dimensions or shared vocabularies. Lastly, these methods require additional time or memory expenditures to enhance performance. To address these issues, we propose L earnable R elational K nowledge D istillation (LRKD). Firstly, LRKD autonomously learns relational knowledge via a dual orthogonal projection without manual design. Secondly, LRKD matches the different representation dimensions through the projection and only leverages mean-pooling to obtain the sequence-level representations for alignment, thereby ignoring the influence of the vocabulary. Lastly, we can deploy LRKD without incurring additional overhead. Specifically, we propose a multi-layer projection to construct more sophisticated relational knowledge. Experimental results demonstrate that LRKD outperforms advanced distillation methods.

External IDs:dblp:conf/dasfaa/HuZLBLCZL25