Keywords: Multimodal Sarcasm Detection
Abstract: Multimodal sarcasm detection (MSD) models often overfit to in-domain data due to a lack of proper understanding of the data, which contains slang or memes in text and includes overly incomprehensible images. And existing methods only focus on inconsistencies in the data while ignoring the diversity of sarcastic expressions. To address this, we propose a novel method which is named as **M**ulti-**P**erspective feature modeling for **M**ultimodal **S**arcasm **D**etection (MPMSD). Specifically, we first use multimodal large language models (MLLMs) to generate relevant knowledge to enhance the understanding of the data. Then, based on the generated data and the original data, MPMSD models diverse types of sarcasm from three perspectives (Knowledge Learning, Incongruity Mining, and Representation Enhancement). Experiments demonstrate that our approach not only outperforms the state-of-the-art (SOTA) but also exhibits strong generalization ability and robust noise resistance.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: multimodality
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7294
Loading