On Adaptive Knowledge Distillation with Generalized KL-Divergence Loss for Ranking Model Refinement

Yingrui Yang; Shanxiu He; Tao Yang

On Adaptive Knowledge Distillation with Generalized KL-Divergence Loss for Ranking Model Refinement

Yingrui Yang, Shanxiu He, Tao Yang

Published: 07 Jun 2024, Last Modified: 07 Jun 2024ICTIR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural document ranking, knowledge distillation

Abstract: Knowledge distillation is useful in training a neural document ranking model by employing a teacher to guide model refinement. As a teacher may not perform well in all cases, over-calibration between the student and teacher models can make training less effective. This paper studies a generalized KL divergence loss in a weighted form for refining ranking models in searching text documents, and examines its formal properties in balancing knowledge distillation in adaption to the relative performance of the teacher and student models. This loss differentiates the role of positive and negative documents for a training query, and allows a student model to take a conservative or deviate approach in imitating teacher's behavior when the teacher model is worse than the student model. This paper presents a detailed theoretical analysis with experiments on the behavior and usefulness of this generalized loss

Submission Number: 31

Loading