Abstract: As an important task in emotion analysis, Multimodal Emotion-Cause Pair Extraction in conversations (MECPE) aims to extract all the emotion-cause utterance pairs from a conversation. However, there are two shortcomings in the MECPE task: 1) it ignores emotion utterances whose causes cannot be located in the conversation but require contextualized inference; 2) it fails to locate the exact causes that occur in vision or audio modalities beyond text. To address these issues, in this paper, we introduce a new task named Multimodal Emotion-Cause Pair Generation in Conversations (MECPG), which aims to identify the emotion utterances with their emotion categories and generate their corresponding causes in a conversation. To tackle the MECPG task, we construct a dataset based on a benchmark corpus for MECPE. We further propose a generative framework named MONICA, which jointly performs emotion recognition and emotion cause generation with a sequence-to-sequence model. Experiments on our annotated dataset show the superiority of MONICA over several competitive systems. Our dataset and source codes will be publicly released.
External IDs:dblp:journals/taffco/MaYWCX25
Loading