Facial Action Unit Detection with the Semantic Prompt

Published: 2024, Last Modified: 07 Nov 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Facial action unit (AU) detection is an essential technique for fine-grained facial expression analysis. To improve the detection performance, the associations among different action units within the detection network should be exploited. In light of this, we propose to exploit the semantic corrections between AUs and improve the detection accuracy via a novel AU prompt framework. Specifically, we incorporate a pre-trained text encoder to extract the textual embeddings for AU descriptions. Then, we treat these embeddings as semantic prompts and feed them into a vision-language cross-attention module to capture the relations among AUs. The cross-attention module will adaptively aggregate the spatial features of a face image encoder, and finally generate discriminative features for each AU. Extensive experiments on BP4D, DISFA, and GFT datasets demonstrate that the proposed framework outperforms state-of-the-art methods in both within-dataset and cross-dataset settings.
Loading