Zero-Shot Warning Generation for Misinformative Multimodal Content

Giovanni Pio Delvecchio, Huy H. Nguyen, Isao Echizen

Published: 2025, Last Modified: 26 Mar 2026WACV (Workshops) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The widespread prevalence of misinformation poses sig-nificant societal concerns. Out-of-context misinformation, where authentic images are paired with false text, is partic-ularly deceptive and easily misleads audiences. Most existing detection methods primarily evaluate image-text consis-tency but often lack sufficient explanations, which are essen-tial for effectively debunking misinformation. We present a model that detects multimodal misinformation through cross-modality consistency checks, requiring minimal training time. Additionally, we propose a lightweight model that achieves competitive performance using only one-third of the parameters. We also introduce a dual-purpose zero-shot learning task for generating contextualized warnings, enabling automated debunking and enhancing user compre-hension. Qualitative and human evaluations of the generated warnings highlight both the potential and limitations of our approach.

External IDs:dblp:conf/wacv/DelvecchioNE25