Multi-Perspective Feature Modeling with MLLMs for Multimodal Sarcasm Detection

Multi-Perspective Feature Modeling with MLLMs for Multimodal Sarcasm Detection

ACL ARR 2026 January Submission7294 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Sarcasm Detection

Abstract: Multimodal sarcasm detection (MSD) models often overfit to in-domain data due to a lack of proper understanding of the data, which contains slang or memes in text and includes overly incomprehensible images. And existing methods only focus on inconsistencies in the data while ignoring the diversity of sarcastic expressions. To address this, we propose a novel method which is named as **M**ulti-**P**erspective feature modeling for **M**ultimodal **S**arcasm **D**etection (MPMSD). Specifically, we first use multimodal large language models (MLLMs) to generate relevant knowledge to enhance the understanding of the data. Then, based on the generated data and the original data, MPMSD models diverse types of sarcasm from three perspectives (Knowledge Learning, Incongruity Mining, and Representation Enhancement). Experiments demonstrate that our approach not only outperforms the state-of-the-art (SOTA) but also exhibits strong generalization ability and robust noise resistance.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: multimodality

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 7294

Loading