Leveraging Embedding Screening for Multimodal Multi-Hop Claims Verification

Leveraging Embedding Screening for Multimodal Multi-Hop Claims Verification

ACL ARR 2025 May Submission3374 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the rapid development of generative AI and the explosive growth of Internet, a large amount of multimodal misinformation has been spreading wantonly. Zero-shot claim verification is crucial for combating this issue. Checking a claim requires multi-hop reasoning across evidence with multiple modalities. Consequently, we design a framework called ES4CV, which utilizes Embedding Screening for multimodal multi-hop Caim Verification. It consists of two modules: one for zero-shot evidence screening and another for zero-shot claims verification. Within the evidence screening module, we employ a General Multimodal Embedder(GME) to project both multimodal evidence and claims into a unified semantic space, where evidence is screened based on similarity. In the zero-shot claim verification module, the filtered evidence and claims are ultimately fed into a Vision Language Model (VLM) for final judgment. We conduct extensive comparative and ablation experiments on the recently released multimodal multi-hop dataset MMCV to demonstrate our method's effectiveness and superiority.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: multimodal applications,fact checking, rumor/misinformation detection

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Submission Number: 3374

Loading