Class Specialized Knowledge Distillation

Li-Yun Wang, Anthony Rhodes, Wu-chi Feng

Published: 01 Jan 2022, Last Modified: 05 May 2023ACCV (2) 2022Readers: Everyone

Abstract: Knowledge Distillation (KD) is a compression framework that transfers distilled knowledge from a teacher to a smaller student model. KD approaches conventionally address problem domains where the teacher and student network have equal numbers of classes for classification. We provide a knowledge distillation solution tailored for class specialization, where the user requires a compact and performant network specializing in a subset of classes from the class set used to train the teacher model. To this end, we introduce a novel knowledge distillation framework, Class Specialized Knowledge Distillation (CSKD), that combines two loss functions: Renormalized Knowledge Distillation (RKD) and Intra-Class Variance (ICV) to render a computationally-efficient, specialized student network. We report results on several popular architectural benchmarks and tasks. In particular, CSKD consistently demonstrates significant performance improvements over teacher models for highly restrictive specialization tasks (e.g., instances where the number of subclasses or datasets is relatively small), in addition to outperforming other state-of-the-art knowledge distillation approaches for class specialization tasks.

0 Replies