TRUST THE UNCERTAIN TEACHER: DISTILLING DARK KNOWLEDGE VIA CALIBRATED UNCERTAINTY

07 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Distillation, Efficient Transfer Learning
TL;DR: This work improves knowledge distillation by transferring only informative signals while skipping overconfident knowledge.
Abstract: The core of knowledge distillation lies in transferring the teacher’s rich ‘dark knowledge’—subtle probabilistic patterns that reveal how classes are related and the distribution of uncertainties. While this idea is well established, teach- ers trained with conventional cross-entropy often fail to preserve such signals. Their distributions collapse into sharp, overconfident peaks that appear decisive but are in fact brittle, offering little beyond the hard label or subtly hinder- ing representation-level transfer. To address this limitation, we revisit distillation from a distributional perspective and propose Calibrated Uncertainty Distillation (CUD), a framework designed to make dark knowledge more faithfully accessi- ble. Instead of uncritically adopting the teacher’s overconfidence, CUD encour- ages teachers to reveal uncertainty where it is informative and guides students to learn from targets that are calibrated rather than sharpened certainty. Across di- verse benchmarks, CUD yields students that are not only more accurate, but also more calibrated under shift and more reliable on ambiguous, long-tail inputs.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 2785
Loading