Multiple Attention to Weight Fusion based Network for in-the-Wild Facial Expression Recognition

Kuan-Hsien Liu, Wen-Ren Liu, Tsung-Jung Liu, Wei-Shen Tai

Published: 2024, Last Modified: 11 Jul 2025ICCE-Taiwan 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Real-world facial image obstruction poses challenges for facial expression recognition due to environmental factors, camera limitations, subject variability, and experimental conditions, leading to low image quality and classification difficulties. In response, we propose a cost-effective FER model comprising four modules: facial features extraction, facial features attention, stages weight-MLP, and stages weight fusion, aimed at addressing these challenges. The facial features extraction module extracts different features through multiple stages, while the facial features attention module employs multiple kernels to focus attention on relevant features. The stages weight-MLP module downsamples weight lengths while preserving tendencies, and the stages weight fusion module integrates weights from multiple stages to classify emotions. The computational cost of the model is 2.4G FLOPs and 14.4M parameters. We pre-trained the backbone on the MS-Celeb-1M dataset and evaluate the model on RAF-DB and AffectNet, achieving accuracies of 89.6% and 62.3%, respectively. The code of our proposed model will release on GitHub for further exploration and use.