Refining Valence-Arousal Estimation with Dual-Stream Label Density Smoothing

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICCE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Emotion recognition through facial expressions remains a long-standing research pursuit, yet the challenges persist, particularly in dynamic real-world scenarios. In-the-wild datasets are hampered by limited emotion annotations due to resource constraints, hindering multi-task methodology advancements. Recent years have witnessed a surge of approaches addressing the valence-arousal problem. However, data imbalance, especially in valence-arousal annotation, persists. This work proposes a novel two-stream valence-arousal estimation network, inspired by MIMAMO Net, leveraging spatial and temporal learning to enhance emotion recognition. Label Density Smoothing (LDS) is introduced to counter skewed distributions. Experimental results showcase the approach’s efficacy, achieving a Concordance Correlation Coefficient (CCC) of 0.591 for valence and 0.617 for arousal on the Aff-Wild2 validation set. This work contributes to the advancement of valence-arousal modeling in facial expression recognition.
Loading