Multi-Relational Geometric Regularization Framework for Multi-Modal Emotion Recognition in Conversation

Tao Zhang, Zhenhua Tan

Published: 2025, Last Modified: 31 May 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing studies on multi-modal emotion recognition in conversation (MMERC) mainly focus on multi-modal fusion and context modeling for emotion representation, facing limitations in uncovering the intrinsic structure of emotion-related data. The existing geometric consistency regularization (GCR) technique aims to build meaningful latent feature structures in multiple modalities and has been validated for non-conversational data, therefore we further explore its application to the MMERC task, which involves conversational data. However, we find that the geometric consistency in conversational data varies among different speaker and conversation relations (intra-speaker, inter-speaker, and inter-conversation relations), probably due to differences in speaker expressions and conversation topics. This makes the direct application of GCR less effective. To address this issue, we propose a Multi-Relational Geometric Regularization Framework for MMERC (R4-MMERC). Our framework constructs geometric structures of conversational data and performs dynamically balanced consistency regularization based on multiple speaker and conversation relations (intra-speaker, interspeaker, and inter-conversation relations). We tested our framework by integrating it with three typical MMERC models on the IEMOCAP benchmark dataset. The results show significant performance improvements, demonstrating the effectiveness of our approach1.
Loading