From Extraction to Generation: Multimodal Emotion-Cause Pair Generation in Conversations

Published: 2025, Last Modified: 25 Dec 2025IEEE Trans. Affect. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As an important task in emotion analysis, Multimodal Emotion-Cause Pair Extraction in conversations (MECPE) aims to extract all the emotion-cause utterance pairs from a conversation. However, there are two shortcomings in the MECPE task: 1) it ignores emotion utterances whose causes cannot be located in the conversation but require contextualized inference; 2) it fails to locate the exact causes that occur in vision or audio modalities beyond text. To address these issues, in this paper, we introduce a new task named Multimodal Emotion-Cause Pair Generation in Conversations (MECPG), which aims to identify the emotion utterances with their emotion categories and generate their corresponding causes in a conversation. To tackle the MECPG task, we construct a dataset based on a benchmark corpus for MECPE. We further propose a generative framework named MONICA, which jointly performs emotion recognition and emotion cause generation with a sequence-to-sequence model. Experiments on our annotated dataset show the superiority of MONICA over several competitive systems. Our dataset and source codes will be publicly released.
Loading