An End-to-End Graph-Guided Spatiotemporal Model for Adaptive Frame-Level Facial Affect Analysis in the Wild

Yan Liang, Yan Hao, Zenan Yao, Jiacheng Liao, Jiahui Pan

Published: 2025, Last Modified: 09 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human emotional states in real life are varied and complex. It is difficult for existing methods to capture robust facial expression features dynamically, especially in a large head pose and occlusion. In this paper, a novel end-to-end graph-guided spatiotemporal convolutional network (GSTCN) is proposed to achieve frame-level facial affect analysis. The GSTCN utilizes graph structure to model face images with missing data and extracts spatial and temporal information of expression sequences. Moreover, a trajectory oscillating coefficient algorithm is designed to evaluate the motion of facial landmarks and help GSTCN adaptively build the graph structures of the most representative facial regions according to the dynamical emotion features, effectively improving the robustness of human affect analysis in the wild. The proposed method is evaluated on two large-scale databases, Aff-Wild2 and AFEW-VA. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.

External IDs:dblp:conf/icassp/LiangHYLP25