MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

Shengkang Wang; Hongzhan Lin; Ziyang Luo; Zhen Ye; Guang Chen; Jing Ma

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma

Published: 06 Mar 2025, Last Modified: 30 Apr 2025ICLR 2025 Workshop Data Problems PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmarking, evaluation, cross-modal application, multimodality

Abstract: Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning. These models embed multimodal facts within their parameters, rather than relying on external knowledge bases to store factual information explicitly. However, the content discerned by LVLMs may deviate from factuality due to inherent bias or incorrect inference. In this work, we introduce MFC-Bench, a rigorous and comprehensive benchmark designed to evaluate the factual accuracy of LVLMs across three stages of verdict prediction for multimodal fact-checking (MFC): Manipulation, Out-of-Context, and Veracity Classification. Through our evaluation on MFC-Bench, we benchmarked a dozen diverse and representative LVLMs, uncovering that current models still fall short in MFC and demonstrate insensitivity to various forms of manipulated content. We hope that MFC-Bench could raise attention to the trustworthy AI potentially assisted by LVLMs in the future.

Submission Number: 16

Loading