Multi-label Self Knowledge Distillation

Xucong Wang; Pengkun Wang; Shurui Zhang; Miao Fang; Yang Wang

Multi-label Self Knowledge Distillation

Xucong Wang, Pengkun Wang, Shurui Zhang, Miao Fang, Yang Wang

Published: 01 Jan 2025, Last Modified: 24 Jun 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Self-Knowledge Distillation (SKD) leverages the student's own knowledge to create a virtual teacher for distillation when the pre-trained bulky teacher is not available. Whilst existing SKD approaches demonstrate gorgeous efficiency in single-label learning, to directly apply them to multi-label learning would suffer from dramatic degradation due to the following inherent imbalance: \textit{targets with unified labels but multifarious visual scales are crammed into one image, resulting in biased learning of major targets and disequilibrium of precision-recall}. To address this issue, this paper proposes a novel SKD method for multi-label learning named Multi-label Self-knowledge Distillation (MSKD), incorporating three Spatial Decoupling mechanisms (i.e. Locality-SD (L-SD), Reconstruction-SD (R-SD), and Step-SD (S-SD)). L-SD exploits relational dark knowledge from regional outputs to amplify the model's perception of visual details. R-SD reconstructs global semantics by integrating regional outputs from local patches and leverages it to guide the model. S-SD aligns outputs of the same input at different steps, aiming to find a synthetical optimizing direction and avoid the overconfidence. In addition, MSKD combines our tailored loss named MBD for balanced distillation. Exhaustive experiments demonstrate that MSKD not only outperforms previous approaches but also effectively mitigates biased learning and equips the model with more robustness.

Loading