Multi-View Contrastive Parsing Network for Emotion Recognition in Multi-Party Conversations

Yunhe Xie, Chengjie Sun, Bingquan Liu, Zhenzhou Ji

Published: 01 Jan 2024, Last Modified: 12 Apr 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent Emotion Recognition in Conversation (ERC) works significantly outperform large language models, represented by ChatGPT, in the dyadic conversation environment by introducing knowledge and adjusting training strategies. However, Multi-Party Conversations (MPCs) are more complex due to their multi-thread nature, low information density, and general long-range dependencies. In addition, previous studies have overlooked the phenomenon of utterance polysemy. To address these challenges, this paper proposes a Multi-View Contrastive Parsing Network (MuVCPN). Specifically, we first parse the entire conversation and extract emotion-related cues from independent sub-conversation views. Then, we update the utterance distance based on the parsing results and use a discourse structure-aware self-attention mechanism to capture the conversational information flow from the global view. At the same time, we adopt supervised contrastive learning to group utterances from the same sub-conversation together. Extensive experiments on four benchmarks show that the proposed MuVCPN model outperforms baseline models on the ERC task. Additionally, experimental results indicate that utilizing different views and sub-conversation level contrastive learning can improve performance in the MPCs environment.