Abstract: Multimodal Emotion Recognition in Conversations (MERC) aims to detect emotions expressed in each utterance within conversational videos. Graph-based methods are widely employed in MERC due to their superiority in modeling intricate speaker-sensitive and context-sensitive dependencies in conversations. Despite promising advancements made, existing graph-based methods primarily suffer from two inherent issues due to their reliance on manually predefined graph structures: structural redundancy, which burdens models with irrelevant noise aggregation, and insufficient connections, which results in a lack of cross-modal contextual cues. To address the above issues, we propose a novel graph structure learning framework for MERC, which comprises two key components: Context-aware Graph Sparsification (CGS) and Implicit Graph Relation Mining (IGR). CGS employs an edge selection network to refine the manually predefined graph, filtering out noisy information caused by structural redundancy. IGR explores potential connections that are beneficial for emotional reasoning. Experimental results on two datasets show that our proposed framework significantly improves the performance of graph-based methods in MERC.
External IDs:dblp:conf/icmcs/XiongTZWCLYYX25
Loading