LANDMARK GUIDANCE INDEPENDENT SPATIO-CHANNEL ATTENTION AND COMPLEMENTARY CONTEXT INFORMATION BASED FACIAL EXPRESSION RECOGNITION

S Balasubramanian, Darshan Gera

Published: 30 Apr 2021, Last Modified: 13 Nov 2024OpenReview Archive Direct UploadEveryoneCC BY-NC-ND 4.0

Abstract: Attention based convolutional neural networks(CNNs) for facial expression recognition (FER) apply attention that is uniform across either spatial dimensions or channel dimensions or both spatial and channel dimensions. However, there are many issues viz. (i) in the presence of occlusions and pose variations, different channels respond differently, (ii) the response intensity of a channel differ across spatial locations, (iii) attention is defined based on external sources like landmark detectors and (iv) features used from pretrained face recognition (FR) model to complement the attention branch contain redundant information. To overcome these issues, an end-to-end architecture for FER is proposed in this work. This architecture obtains both local and global attention per channel per spatial location through a novel spatio-channel attention net (SCAN), without seeking any information from the landmark detectors. SCAN is complemented by a complementary context information (CCI) branch that builds expression representation from the pretrained FR features. Redundancies in FR features are eliminated by using efficient channel attention (ECA). The representation learnt by the proposed architecture is robust to occlusions and pose variations. This is demonstrated by the state-of-the-art performance of the proposed model on in-the-wild datasets including AffectNet, FERPlus, RAF-DB, SFEW and FED-RO. Further, the proposed architecture also reports superior performance on in-lab datasets (CK+, Oulu-CASIA and JAFFE) and a couple of constructed face masked datasets resembling masked faces in COVID-19 scenario. Codes are publicly available at https://github.com/1980x/SCAN-CCI-FER.