Transpose and Mask: Simple and Effective Logit-Based Knowledge Distillation for Multi-attribute and Multi-label Classification

Yuwei Zhao, Annan Li, Guozhen Peng, Yunhong Wang

Published: 01 Jan 2023, Last Modified: 11 Apr 2025PRCV (10) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Knowledge distillation (KD) improves a student network by transferring knowledge from a teacher network. Although KD has been extensively studied in single-labeled image classification, it is not well explored under the scope of multi-attribute and multi-label classification. We observe that the logit-based KD method for the single-label scene utilizes information from multiple classes in a single sample, but we find such logits are less informative in the multi-label scene. To address this challenge in the multi-label scene, we design a Transpose method to extract information from multiple samples in a batch instead of a single sample. We further note that certain classes may lack positive samples in a batch, which can negatively impact the training process. To address this issue, we design another strategy, the Mask, to prevent the influence of negative samples. To conclude, we propose Transpose and Mask Knowledge Distillation (TM-KD), a simple and effective logit-based KD framework for multi-attribute and multi-label classification. The effectiveness of TM-KD is confirmed by experiments on multiple tasks and datasets, including pedestrian attribute recognition (PETA, PETA-zs, PA100k), clothing attribute recognition (Clothing Attributes Dataset), and multi-label classification (MS COCO), showing impressive and consistent performance gains.