TL;DR: DEFAME is a modular, training-free fact-checking system that verifies open-domain text-image claims end-to-end, dynamically retrieving and integrating textual and visual evidence.
Abstract: The proliferation of disinformation demands reliable and scalable fact-checking solutions. We present **D**ynamic **E**vidence-based **FA**ct-checking with **M**ultimodal **E**xperts (DEFAME), a modular, zero-shot MLLM pipeline for open-domain, text-image claim verification. DEFAME operates in a six-stage process, dynamically selecting the tools and search depth to extract and evaluate textual and visual evidence. Unlike prior approaches that are text-only, lack explainability, or rely solely on parametric knowledge, DEFAME performs end-to-end verification, accounting for images in claims *and* evidence while generating structured, multimodal reports. Evaluation on the popular benchmarks VERITE, AVeriTeC, and MOCHEG shows that DEFAME surpasses all previous methods, establishing itself as the new general state-of-the-art fact-checking system for uni- and multimodal fact-checking. Moreover, we introduce a new multimodal benchmark, ClaimReview2024+, featuring claims after the knowledge cutoff of GPT-4o, avoiding data leakage. Here, DEFAME drastically outperforms the GPT-4o baselines, showing temporal generalizability and the potential for real-time fact-checking.
Lay Summary: Misinformation is becoming more common and harder to detect—especially when it mixes text with images. People often believe what they see, and misleading image-text combinations can quickly spread across the internet. To help address this, we built a system called DEFAME that checks whether claims found online are true or false, using both text and images.
DEFAME mimics how a human fact-checker might work: it searches the web, reviews images, and cross-checks information from different sources. Unlike earlier systems that look at text alone or rely heavily on built-in memory, DEFAME uses external tools to find fresh and reliable evidence and then explains its verdict in clear, structured reports.
We tested DEFAME on standard fact-checking tasks and also built a new set of recent claims, chosen to come after the time when models like GPT-4o were last updated. This is a good way to test how well a system handles fresh, real-world information. DEFAME not only beat older methods but also outperformed powerful models like GPT-4o on these newer claims. This shows that DEFAME may be better suited for keeping up with breaking news and fast-spreading misinformation.
Link To Code: https://github.com/multimodal-ai-lab/DEFAME/tree/icml
Primary Area: Applications->Everything Else
Keywords: fact-checking, multimodal, claim-verification, MLLM
Submission Number: 16152
Loading