Live On the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning

Shuang Wang; Pengyi Hao; Fuli Wu; Cong Bai

Live On the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning

Shuang Wang, Pengyi Hao, Fuli Wu, Cong Bai

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: For solving the limitations of the current self knowledge distillation including never fully utilizing the knowledge of shallow exits and neglecting the impact of auxiliary exits' structure on the performance of network, a novel self knowledge distillation framework via virtual teacher-students mutual learning named LOTH is proposed in this paper. A knowledgeable virtual teacher is constructed from the rich feature maps of each exit to help the learning of each exit. Meanwhile, the logit knowledges of each exit are incorporated to guide the learning of the virtual teacher. They learn mutually through the well-designed loss in LOTH. Moreover, two kinds of auxiliary building blocks are designed to balance the efficiency and effectiveness of network. Extensive experiments with diverse backbones on CIFAR-100 and Tiny-ImageNet validate the effectiveness of LOTH, which realizes superior performance with less resource by the comparison with the state-of-the-art distillation methods. The code of LOTH is available on Github.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Content] Vision and Language

Relevance To Conference: Current convolutional neural networks with superior performance typically require expensive computational and memory overheads, which makes it difficult to deploy on resource-limited edge devices. In this paper, we provide an efficient self-distillation approach that aims to mine knowledge from itself to improve the performance of various popular networks. Our approach enhances the feature representation of image media and enables performance that matches or even exceeds that of high-capacity networks with low-capacity networks, thus facilitating the deployment of edge devices. Importantly, our approach can be used for a lot of multimedia applications.

Submission Number: 4883

Loading