CAKD: A Confidence-Aware Knowledge Distillation Approach for Building Compact and Efficient LLMs

CAKD: A Confidence-Aware Knowledge Distillation Approach for Building Compact and Efficient LLMs

ACL ARR 2025 February Submission4425 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: High-quality models across various natural language processing tasks, such as summarization and chatbots, often rely on large architectures, making them computationally intensive and challenging to deploy in resource-constrained environments. While knowledge distillation enables smaller student models to approximate the performance of larger teacher models, existing methods frequently encounter significant trade-offs between accuracy and efficiency. Additionally, uncertain predictions from teacher models can negatively impact the student’s learning process. In this paper, we introduce CAKD, a novel approach that optimizes the training of student models by selectively emphasizing the teacher model's most reliable predictions using confidence scores. By integrating entropy-based confidence weighting into the distillation loss, CAKD effectively prioritizes high-confidence samples, resulting in improved performance and efficiency. Our experiments on text summarization (using a BART-based model on the CNN/DM dataset) and chatbot tasks (using Llama-based model on the DailyDialog and PersonaChat datasets) demonstrate that CAKD achieves significant performance gains over larger teacher models, with improvements of 10.53, 2.1 and 0.38 ROUGE-L points respectively.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Knowledge Distillation, Chatbot, Summarization

Languages Studied: English

Submission Number: 4425

Loading