Is a Picture Worth a Thousand Words? Agentic Multimodal Fact-Checking for Adaptive Use of Visual Evidence

Is a Picture Worth a Thousand Words? Agentic Multimodal Fact-Checking for Adaptive Use of Visual Evidence

ACL ARR 2026 January Submission3302 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Fact-Checking, Vision-Language Models, Agentic Fact-Checking, Visual Evidence Necessity

Abstract: Automated fact-checking is a crucial task not only in journalism but also across web platforms, where it supports a responsible information ecosystem and mitigates the harms of misinformation. While recent research has progressed from text-only to multimodal fact-checking, a prevailing assumption is that incorporating visual evidence universally improves performance. In this work, we challenge this assumption and show that indiscriminate use of multimodal evidence can reduce accuracy. To address this challenge, we propose an agentic fact-checking framework, AMuFC (Adaptive Agentic Multimodal Fact-Checking with Visual Evidence Necessity). The proposed method employs an Analyzer that determines whether visual evidence is necessary for claim verification and a Verifier that predicts claim veracity conditioned on both the retrieved evidence and the Analyzer's assessment. Experimental results show that incorporating the Analyzer's assessment of visual evidence necessity into the Verifier's prediction yields substantial improvements in verification performance. Case studies further support its generalizability across diverse fact-checking scenarios.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: misinformation detection and analysis, fact checking

Contribution Types: NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 3302

Loading