Abstract: Multi-modality physiological signal-based emotion recognition has attracted increasing attention as its capacity to capture human affective states comprehensively. Due to multi-modality heterogeneity and cross-subject divergence, practical applications struggle with generalizing models across individuals. Effectively addressing both issues requires mitigating the gap between multi-modality signals while acquiring generalizable representations across subjects. However, existing approaches often handle these dual challenges separately, resulting in suboptimal generalization. This study introduces a novel framework, termed Correlation-Driven Multi-Modality Graph Decomposition (CMMGD). The proposed CMMGD initially captures adaptive cross-modal correlations to connect each unimodal graph to a multi-modality mixed graph. To simultaneously address the dual challenges, it incorporates a correlation-driven graph decomposition module that decomposes the mixed graph into concordant and discrepant subgraphs based on the correlations. The decomposed concordant subgraph encompasses consistently activated features across modalities and subjects during emotion elicitation, unveiling a generalizable subspace. Additionally, we design a Multi-Modality Graph Regularized Transformer (MGRT) backbone specifically tailored for multimodal physiological signals. The MGRT can alleviate the over-smoothing issue and mitigate over-reliance on any single modality. Extensive experiments demonstrate that CMMGD outperforms the state-of-the-art methods by 1.79% and 2.65% on DEAP and MAHNOB-HCI datasets, respectively, under the leave-one-subject-out cross-validation strategy.
Primary Subject Area: [Engagement] Emotional and Social Signals
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: We aim to significantly contribute to the topic of Emotional and Social Signals within the Engaging Users with Multimedia Theme at ACM Multimedia 2024. Our manuscript delves into analyzing emotional states through multi-modality physiological signals, introducing automated techniques for processing and interpreting human emotions. We acknowledge the substantial challenges in generalizing multimodal emotion recognition models across individuals, compounded by multi-modality heterogeneity and cross-subject divergence. These intersected dual challenges lead to model performance degradation in real-world applications. To address these challenges, we introduce a novel framework termed the Correlation-Driven Multi-Modality Graph Decomposition (CMMGD). The proposed CMMGD connects the graph of each modality by learning adaptive channel correlation to mitigate the gap between multi-modality signals. It then decomposes the union graph based on the correlation to extract generalizable representations across subjects, thereby enhancing the generalization capability of the model. The effectiveness of the proposed CMMGD framework is validated through the evaluation of two benchmark datasets of multimodal physiological signals. The innovation and contributions of our study make it exceedingly pertinent to the focus and topics of the conference.
Supplementary Material: zip
Submission Number: 4862
Loading