Quantifying Cross-Domain Knowledge Distillation in the Presence of Domain shift

ICLR 2026 Conference Submission17891 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: knowledge distillation;domain adaption; generalization error; random matrix
TL;DR: We theoretically quantify the potential performance gain from cross-domain knowledge distillation.
Abstract: Cross-domain knowledge distillation often suffers from domain shift. Although domain adaptation methods have shown strong empirical success in addressing this issue, their theoretical foundations remain underdeveloped. In this paper, we study knowledge distillation in a teacher–student framework for regularized linear regression and derive high-dimensional asymptotic excess risk for the student estimator, accounting for both covariate shift and model shift. This asymptotic analysis enables a precise characterization of the performance gain in cross-domain knowledge distillation. Our results demonstrate that, even under substantial shifts between the source and target domains, it remains feasible to identify an imitation parameter for which the student model outperforms the student-only baseline. Moreover, we show that the student's generalization performance exhibits the double descent phenomenon.
Primary Area: learning theory
Submission Number: 17891
Loading