MVT-CEAM: a lightweight MobileViT with channel expansion and attention mechanism for facial expression recognition

Kunxia Wang, Wancheng Yu, Takashi Yamauchi

Published: 2024, Last Modified: 17 Mar 2026Signal Image Video Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Facial expression recognition is a crucial area of study in psychology that can be applied to many fields, such as intelligent healthcare, human-computer interaction, fuzzy control and other domains. However, current deep learning models usually encounter high complexity, expensive computational requirements and outsized parameters. These obstacles hinder the deployment of applications on resource-constrained mobile terminals. This paper proposes an improved lightweight MobileViT with channel expansion and attention mechanism for facial expression recognition to address these challenges. In this model, we adopt a channel expansion strategy to effectively extract more critical facial expression feature information from multi-scale feature maps. Furthermore, we introduce a channel attention module within the model to improve feature extraction performance. Compared with typical lightweight models, our proposed model significantly improves the accuracy rate while maintaining a lightweight network. Our proposed model achieves 94.35 and 87.41% accuracy on the KDEF and RAF-DB datasets, respectively, demonstrating superior recognition performance.

External IDs:dblp:journals/sivp/WangYY24