LDGNet: LLMs Debate-Guided Network for Multimodal Sarcasm Detection

Published: 2025, Last Modified: 03 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal sarcasm detection aims to uncover the sarcasm emotions expressed through various modalities such as text and image. Previous work has made enlightening exploration in detecting sarcastic sentiments with given domains. However, there remains a gap in utilizing deeper contextual information to capture elusive sarcastic clues, hidden in open-world knowledge such as history, politics, and common sense of life that has not been touched by previous models. To address this gap, a natural idea is to simulate the process of a debate, involving debaters with different viewpoints and judges to collaboratively drive the judgment of emotional expressions. Benefiting from the development of large multimodal language models, and building upon previous advancements, we propose a novel framework called LLMs Debate-Guided Network (LDGNet) for Multimodal Sarcasm Detection. LDGNet effectively leverages large language model debates to uncover subtle emotional information and uses an innovative Judge Network for more realible and accurate sentiment judgments. Extensive experiments on in-domain and out-of-distribution (OOD) datasets have validated the superiority of our proposed method.
Loading