Solution for 8th Competition on Affective & Behavior Analysis in-the-wild

Jun Yu, Yunxiang Zhang, Xilong Lu, Yang Zheng, Yongqi Wang, Lingsi Zhu

Published: 2025, Last Modified: 27 Feb 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this report, we present our solution for the Action Unit (AU) Detection Challenge, in 8th Competition on Affective Behavior Analysis in-the-wild. In order to achieve robust and accurate classification of facial action unit in the wild environment, we introduce an innovative method that leverages audio-visual multimodal data. Our method employs ConvNeXt as the image encoder and uses Whisper to extract Mel spectrogram features. For these features, we utilize a Transformer encoder-based feature fusion module to integrate the affective information embedded in audio and image features. This ensures the provision of rich high-dimensional feature representations for the subsequent multilayer perceptron (MLP) trained on the Aff-Wild2 dataset, enhancing the accuracy of AU detection.
Loading