Abstract: Knowledge Distillation (KD) is a popular technique to
transfer knowledge from a teacher model or ensemble to
a student model. Its success is generally attributed to the
privileged information on similarities/consistency between
the class distributions or intermediate feature representations of the teacher model and the student model. However, directly pushing the student model to mimic the probabilities/features of the teacher model to a large extent
limits the student model in learning undiscovered knowledge/features. In this paper, we propose a novel inheritance
and exploration knowledge distillation framework (IE-KD),
in which a student model is split into two parts - inheritance
and exploration. The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from
the teacher model to the student model, while the exploration part is encouraged to learn representations different
from the inherited ones with a dis-similarity loss. Our IEKD framework is generic and can be easily combined with
existing distillation or mutual learning methods for training
deep neural networks. Extensive experiments demonstrate
that these two parts can jointly push the student model to
learn more diversified and effective representations, and
our IE-KD can be a general technique to improve the student network to achieve SOTA performance. Furthermore,
by applying our IE-KD to the training of two networks, the
performance of both can be improved w.r.t. deep mutual
learning.
0 Replies
Loading