The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework
Abstract: In the context of label-efficient learning on video
data, the distillation method and the structural design of the
teacher-student architecture have a significant impact on knowl
edge distillation. However, the relationship between these factors
has been overlooked in previous research. To address this gap,
we propose a new weakly supervised learning framework for
knowledge distillation in video classification that is designed to
improve the efficiency and accuracy of the student model. Our
approach leverages the concept of substage-based learning to
distill knowledge based on the combination of student substages
and the correlation of corresponding substages. We also employ
the progressive cascade training method to address the accuracy
loss caused by the large capacity gap between the teacher and
the student. Additionally, we propose a pseudo-label optimization
strategy to improve the initial data label. To optimize the loss
functions of different distillation substages during the training
process, we introduce a new loss method based on feature
distribution. We conduct extensive experiments on both real and
simulated data sets, demonstrating that our proposed approach
outperforms existing distillation methods in terms of knowledge
distillation for video classification tasks. Our proposed substage
based distillation approach has the potential to inform future
research on label-efficient learning for video data.
Loading