Visual emotion analysis using skill-based multi-teacher knowledge distillation

Published: 01 Jan 2025, Last Modified: 13 May 2025Pattern Anal. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The biggest challenge in visual emotion analysis (VEA) is bridging the affective gap between the features extracted from an image and the emotion it expresses. It is therefore essential to rely on multiple cues to have decent predictions. Recent approaches use deep learning models to extract rich features in an automated manner, through complex frameworks built with multi-branch convolutional neural networks and fusion or attention modules. This paper explores a different approach, by introducing a three-step training scheme and leveraging knowledge distillation (KD), which reconciles effectiveness and simplicity, and thus achieves promising performances despite using a very basic CNN. KD is involved in the first step, where a student model learns to extract the most relevant features on its own, by reproducing those of several teachers specialized in different tasks. The proposed skill-based multi-teacher knowledge distillation (SMKD) loss also ensures that for each instance, the student focuses more or less on the teachers depending on their capacity to obtain a good prediction, i.e. their relevance. The two remaining steps serve respectively to train the student’s classifier and to fine-tune the whole model, both for the VEA task. Experiments on two VEA databases demonstrate the gain in performance offered by our approach, where the students consistently outperform their teachers, and also state-of-the-art methods.
Loading