Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations
Abstract: Emotion cause analysis has attracted increasing attention in recent years.
Extensive research has been dedicated to multimodal emotion recognition in conversations.
However, the integration of multimodal information with emotion cause remains underexplored.
Existing studies merely extract utterances or spans from conversations as cause evidence, which may not be concise and clear enough, especially the lack of explicit descriptions of other modalities, making it difficult to intuitively understand the causes.
To address these limitations, we introduce a new task named Multimodal Emotion Cause Generation in Conversations (MECGC), which aims to generate an abstractive summary describing the causes that trigger the given emotion based on the multimodal context of conversations.
We accordingly construct a dataset named ECGF that contains 1,374 conversations and 7,690 emotion instances from TV series.
We further develop a generative framework that first generates emotion-cause aware video captions (Observe) and then facilitates the generation of emotion causes (Generate).
The captioning model is trained with examples synthesized by a Multimodal Large Language Model (MLLM).
Experimental results demonstrate the effectiveness of our framework and the significance of multimodal information for emotion cause analysis.
Primary Subject Area: [Engagement] Emotional and Social Signals
Relevance To Conference: This work introduces a new multimodal task, MECG, which aims to generate an abstractive summary of the emotion causes based on multimodal conversations. The constructed dataset also provides a valuable resource for training and evaluating models to solve this task. The proposed framework leverages three modalities (text, audio, and video) of the conversations, generating emotion-aware video captions and then reasoning about the causes. Experimental results demonstrate the effectiveness of the framework and the significance of considering multiple modalities in emotion cause reasoning.
Supplementary Material: zip
Submission Number: 4943
Loading