Expression recognition method based on feature redundancy optimization

Published: 01 Jan 2025, Last Modified: 22 Jul 2025Signal Image Video Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Facial expression recognition (FER) plays a crucial role in domains such as healthcare and access security. Traditional models primarily utilize convolutional networks to extract features like facial landmarks and positions of facial features. However, these methods often result in feature maps with significant redundancy, contributing minimally to network performance enhancement. To address this limitation, we propose the DPConv module, which innovatively segments the channel dimension and applies dual convolutional kernel sizes. This module replaces several convolutional blocks within the POSTER++ (Mao et al. in POSTER++: A Simpler and Stronger Facial Expression Recognition Network. arXiv:2301.12149, 2023) architecture, leading to a reduction in parameters while simultaneously enhancing network efficiency and accuracy. Moreover, we propose a sliding window multi-head cross-self-attention mechanism, which is based on the sliding window multi-head self-attention (Liu et al. in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021) mechanism, which substitutes the conventional attention mechanism, facilitating the modeling of global dependencies and further optimizing the network’s overall performance. Our model, DPPOSTER, was tested on the RAF-DB, FERPlus and SFEW datasets, and experimental comparisons were conducted with different combinations of convolution kernel sizes and channel segmentation ratios. The results showed that DPPOSTER achieved performance improvements of 0.59%, 0.37% and 2.32% over POSTER++ on the RAF-DB, FERPlus and SFEW datasets, respectively.
Loading