Abstract: In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses intrinsic prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while intrinsic prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the Sparse Feature Fusion module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and intrinsic prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition(DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of modality absence and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.
Primary Subject Area: [Engagement] Emotional and Social Signals
Secondary Subject Area: [Engagement] Emotional and Social Signals, [Content] Multimodal Fusion, [Content] Vision and Language
Relevance To Conference: This paper introduces a multimodal approach tailored for the domain of Dynamic Facial Expression Recognition (DFER), primarily addressing the challenges of modal absence and multimodal fusion within the DFER field. Inspired by the theory of cross-modal plasticity, we have developed a brain-like unified modal paradigm that significantly outperforms the current state-of-the-art (SOTA) methods in both multimodal and missing modal scenarios. UMBEnet demonstrates remarkable performance and adaptability in tasks involving the interpretation of complex emotions, leveraging rich multi-channel information.
Supplementary Material: zip
Submission Number: 40
Loading