Abstract: Real-world facial image obstruction poses challenges for facial expression recognition due to environmental factors, camera limitations, subject variability, and experimental conditions, leading to low image quality and classification difficulties. In response, we propose a cost-effective FER model comprising four modules: facial features extraction, facial features attention, stages weight-MLP, and stages weight fusion, aimed at addressing these challenges. The facial features extraction module extracts different features through multiple stages, while the facial features attention module employs multiple kernels to focus attention on relevant features. The stages weight-MLP module downsamples weight lengths while preserving tendencies, and the stages weight fusion module integrates weights from multiple stages to classify emotions. The computational cost of the model is 2.4G FLOPs and 14.4M parameters. We pre-trained the backbone on the MS-Celeb-1M dataset and evaluate the model on RAF-DB and AffectNet, achieving accuracies of 89.6% and 62.3%, respectively. The code of our proposed model will release on GitHub for further exploration and use.
Loading