Is a Picture Worth a Thousand Words? Agentic Multimodal Fact-Checking for Adaptive Use of Visual Evidence
Keywords: Multimodal Fact-Checking, Vision-Language Models, Agentic Fact-Checking, Visual Evidence Necessity
Abstract: Automated fact-checking is a crucial task not only in journalism but also across web platforms, where it supports a responsible information ecosystem and mitigates the harms of misinformation. While recent research has progressed from text-only to multimodal fact-checking, a prevailing assumption is that incorporating visual evidence universally improves performance. In this work, we challenge this assumption and show that indiscriminate use of multimodal evidence can reduce accuracy. To address this challenge, we propose an agentic fact-checking framework, AMuFC (Adaptive Agentic Multimodal Fact-Checking with Visual Evidence Necessity). The proposed method employs an Analyzer that determines whether visual evidence is necessary for claim verification and a Verifier that predicts claim veracity conditioned on both the retrieved evidence and the Analyzer's assessment. Experimental results show that incorporating the Analyzer's assessment of visual evidence necessity into the Verifier's prediction yields substantial improvements in verification performance. Case studies further support its generalizability across diverse fact-checking scenarios.
Paper Type: Long
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: misinformation detection and analysis, fact checking
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 3302
Loading