Towards multimodal sarcasm detection via label-aware graph contrastive learning with back-translation augmentation

Yiwei Wei, Maomao Duan, Hengyang Zhou, Zhiyang Jia, Zengwei Gao, Longbiao Wang

Published: 2024, Last Modified: 05 Jan 2026Knowl. Based Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multimodal sarcasm detection, as a sentiment analysis task, has witnessed great strides owing to the rapid development of multimodal machine learning. However, existing graph-based studies mainly focus on capturing the atomic-aware relations between textual and visual graphs within individual instances, neglecting label-aware connections between different instances. To address this limitation, we propose a novel Label-aware Graph Contrastive Learning (LGCL) method that detects ironic cues from a label-aware perspective of multimodal data. We first construct unimodal graphs for each instance and fuse them into graph semantic space, to obtain the multimodal graphs. Then, we introduce two label-aware graph contrastive losses: Label-aware Unimodal Contrastive Loss (LUCL) and Label-aware Multimodal Contrastive Loss (LMCL), to make the model aware of the shared ironic cues related to sentiment labels within multimodal graph representations. Additionally, we propose Back-translation Data Augmentation (BTrA) for both textual and visual data to enhance contrastive learning, where different back-translation schemes are designed to generate a larger number of positive and negative samples. Experimental results on two public datasets demonstrate our method achieves state-of-the-art (SOTA) compared to previous methods.