Leveraging Embedding Screening for Multimodal Multi-Hop Claims Verification

ACL ARR 2025 May Submission3374 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: With the rapid development of generative AI and the explosive growth of Internet, a large amount of multimodal misinformation has been spreading wantonly. Zero-shot claim verification is crucial for combating this issue. Checking a claim requires multi-hop reasoning across evidence with multiple modalities. Consequently, we design a framework called ES4CV, which utilizes Embedding Screening for multimodal multi-hop Caim Verification. It consists of two modules: one for zero-shot evidence screening and another for zero-shot claims verification. Within the evidence screening module, we employ a General Multimodal Embedder(GME) to project both multimodal evidence and claims into a unified semantic space, where evidence is screened based on similarity. In the zero-shot claim verification module, the filtered evidence and claims are ultimately fed into a Vision Language Model (VLM) for final judgment. We conduct extensive comparative and ablation experiments on the recently released multimodal multi-hop dataset MMCV to demonstrate our method's effectiveness and superiority.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: multimodal applications,fact checking, rumor/misinformation detection
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 3374
Loading