Teachers That Listen: Adaptive Student-Aware Distillation for Reasoning

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Iterative Distillation, Knowledge transfer, Smaller models
TL;DR: An adaptive and iterative knowledge distillation approach
Abstract: Knowledge distillation is a standard approach to compress the capabilities of large language models into smaller students. However, standard distillation methods often produce suboptimal results due to a mismatch between teacher-generated rationales and the student's specific learning requirements. In this paper, we introduce the Adaptive student-aware Distillation for Reasoning (AdaptDistill), designed to bridge this gap by iteratively identifying the student's errors and allowing the teacher to refine its explanations according to the student's needs. Each iteration directly targets the student's learning deficiencies, motivating the teacher to provide tailored rationales that specifically address these weaknesses for better learning. Empirical evaluations on various challenging mathematical and commonsense reasoning tasks demonstrate that our adaptive distillation approach, AdaptDistill, significantly outperforms standard distillation methods, achieving significant performance gains. Our work fundamentally reframes knowledge distillation as an iterative teacher–student interaction, effectively leveraging dynamic refinement by the teacher for better knowledge distillation.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 11138
Loading