MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection

JiaQi Wang; Lu Lu; Mingmin Chi; Jian Chen

MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection

JiaQi Wang, Lu Lu, Mingmin Chi, Jian Chen

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The effectiveness of contrastive-learning-based Knowledge Distillation (KD) has sparked renewed interest in relational distillation, but these methods typically focus on angle-wise information from the penultimate layer. We show that exploiting relational information derived from intermediate layers further improves the effectiveness of distillation. We also find that adding distance-wise relational information to contrastive-learning-based methods negatively impacts distillation quality, revealing an implicit contention between angle-wise and distance-wise attributes. Therefore, we propose a ${\bf{M}}$ulti-stage ${\bf{D}}$ecoupled ${\bf{R}}$elational (MDR) KD framework equipped with an adaptive stage selection to identify the stages that maximize the efficacy of transferring the relational knowledge. MDR framework decouples angle-wise and distance-wise information to resolve their conflicts while still preserving complete relational knowledge, thereby resulting in an elevated transferring efficiency and distillation quality. To evaluate the proposed method, we conduct extensive experiments on multiple image benchmarks ($\textit{i.e.}$ CIFAR100, ImageNet and Pascal VOC), covering various tasks ($\textit{i.e.}$ classification, few-shot learning, transfer learning and object detection). Our method exhibits superior performance under diverse scenarios, surpassing the state of the art by an average improvement of 1.22\% on CIFAR-100 across extensively utilized teacher-student network pairs.

Primary Subject Area: [Content] Vision and Language

Secondary Subject Area: [Content] Vision and Language

Relevance To Conference: Current multimedia technologies rely on increasingly large neural networks, which require large amounts of computing and storage resources, making model deployment expensive and cumbersome. Knowledge distillation technology, as a model compression technology with high compatibility and generalization, can effectively solve this limitation. we propose a novel framework equipped with an adaptive stage selection strategy for relation-based knowledge distillation, which enables efficient extraction of relational information across multiple stages.By decoupling the relationship into angle and length difference and introducing a novel training method for the self-supervised module, our approach enables the student to acquire knowledge more effectively.Experiment results show that our method significantly surpasses SOTA performance on the standard image classification benchmarks in the field of KD.It also opens the door for further improvements of knowledge transfer methods based on relationship.

Supplementary Material: zip

Submission Number: 4478

Loading