Hierarchical Multi-Modal Sarcasm Detection with Dual-Layer Associated Incongruity Learning

Hierarchical Multi-Modal Sarcasm Detection with Dual-Layer Associated Incongruity Learning

ACL ARR 2026 January Submission5279 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-modal Sarcasm Detection, Sentiment Incongruity, Multimodal Fusion, Natural Language Processing, Self-constructed Multimodal Sarcasm Dataset

Abstract: Multi-modal sarcasm detection aims to infer the true intent of content, thereby enhancing the capability of deep learning models in understanding text-image pairs. Most existing approaches adopt graph-based or attention-based methods to model sarcasm likelihood. While these methods highlight semantic incongruity, they fail to fully capture the nuanced characteristics of sarcasm rhetoric. Due to its intense emotional expression and complex linguistic patterns, sarcasm exhibits distinct incongruity manifestations at both the word and sentence levels. From a model perspective, the proportional distribution of sarcastic content also impacts detection performance. To address these issues, this paper proposes a dual-layer associated incongruity method for multi-modal sarcasm detection tasks. Additionally, we construct a standardized multi-modal sarcasm dataset based on Amazon product reviews. Extensive experiments on both classic benchmarks and our self-constructed dataset validate the reliability of the proposed method, achieving a 0.3\% accuracy improvement compared to other state-of-the-art models in the field, which demonstrates its adaptability and robustness for sarcasm detection tasks.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: Multi-model Sarcasm

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 5279

Loading