Abstract: Multi-modal sarcasm detection is crucial for understanding human communications. A key aspect of multi-modal sarcasm detection is the analysis of emotion incongruity. However, the advancement of emotion analysis in video is hindered by the scarcity of labeled datasets, which are limited in both scale and diversity due to high human annotation cost. In this paper, to deal with different kinds of emotion distributions of open-topic in video data, we propose a simple yet remarkably effective method named Prompted Emotion Distribution Enhancement (PEDE). This method leverages large-scale pre-trained models to generate emotion distributions, thereby enriching the input features for sarcasm detection models. Then, intra- and inter-modality emotion graphs are constructed and a graph attention network (GAT) is used to learn emotion incongruity in input. Extensive experiments demonstrate that our approach can significantly enhance the performance of existing multi-modal sarcasm detection approaches on a sarcasm video dataset.
External IDs:dblp:conf/icassp/ZhangCLL25
Loading