Bilevel Relational Graph Representation Learning-based Multimodal Emotion Recognition in Conversation
Abstract: Emotion recognition in conversation (ERC) is a key research topic in natural language processing, helping computers understand human emotions. Although substantial strides have been made in deep learning methods, the exploration of graphs in multimodal ERC is still in its infancy. The bottleneck of graph-based methods lies in the neighborhood aggregation strategy, a mechanism through which node attributes are transmitted and gathered. Existing strategies suffer from the issue of redundant irrelevant information, causing interference with the discriminative information of nodes. Moreover, traditional single-layer graph convolutional networks face challenges in efficiently extracting long-range contextual information. To address these issues, we propose a multimodal conversational emotion recognition approach based on a bilevel relational graph (BiGraph). Specifically, we construct two graphs: a global affinity graph, clustered by assessing node similarity between the target node and its neighborhood nodes to preserve discriminative information. Another is a local context dependency graph based on information from different speakers. The edges of these graphs are mainly determined by speaker context and temporal relations. Our method yields compelling results in extensive experiments conducted on the IEMOCAP and MOSEI datasets, which demonstrate the effectiveness and superiority of the proposed model. Our code is available at https://github.com/LiMei0329/BiGraph.
Loading