Enhancing Facial Expression Recognition by Integrating Global Dependencies with Modified Non-Local Convolutional Neural Networks

Published: 2024, Last Modified: 06 Nov 2025CVMI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Facial Expression Recognition (FER) has advanced significantly with the use of Convolutional Neural Networks (CNNs), a prominent vision backbone. Despite these advancements, CNNs face limitations in capturing global dependencies across multiple facial units due to their reliance on spatial locality, leading to suboptimal performance in FER tasks. To address this issue, we propose integrating a modified non-local block into CNN-based architectures. This block effectively models non-local pixel interactions while ensuring stable dynamics, allowing for the exploration of complex non-local patterns. Our approach preserves global information and enhances CNNs’ ability to identify structural relationships between different facial units, thereby improving FER performance. We validate the effectiveness of this method through experimental evaluations on three public FER datasets—CK+, JAFFE, and KDEF—where it outperforms baseline methods, demonstrating its superiority in FER tasks. The practical implications of this work suggest that enhancing CNNs with non-local modeling capabilities can lead to more accurate and robust FER systems, beneficial for applications in human-computer interaction and emotional AI systems.
Loading