Reading the Lines, Decoding the Minds: Explainable Satirical Cartoon Detection via On-Canvas and Beyond-Canvas Modeling
Keywords: Multimodal Reasoning, Satire Detection, Satirical Cartoons
Abstract: Multimodal Satire Detection (MSD) aims to identify implicit criticism and stance in social media. However, existing MSD benchmarks mainly use user generated content and predominantly consist of realistic photographs, where satire is often conveyed through overt image text incongruity. By contrast, increasingly popular editorial cartoons convey satire through symbolic and metaphorical cues, requiring models to infer latent intent and pragmatics.
To address these limitations, we propose MSCE, a dedicated evaluation benchmark for Multimodal Satirical Cartoons Evaluation.
As a strong baseline for MSCE, we introduce a novel explainable framework named DOUBLE.
DOUBLE endows Multimodal Large Language Model (MLLM) with a Clue2View mechanism to explicitly model interpretation through two complementary views: a literal on-canvas view and a metaphorical beyond-canvas view.
In addition, DOUBLE incorporates a lightweight Small Language Model Arbiter to distill reasoning traces from the MLLM, ensuring reliable predictions with lower computational costs.
Experiments on MSCE demonstrate that DOUBLE achieves the best performance, showcasing its ability to provide clear, well-grounded rationales for complex satirical cartoons.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Information Extraction
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English, Chinese
Submission Number: 5577
Loading