Abstract: Fact-Check Retrieval (FCR) plays a crucial role in automated fact-checking by retrieving relevant fact-checked articles for disputed claims. While recent work has explored text-based, multilingual, and multimodal FCR, most efforts remain unimodal or limited to English. To bridge this gap, we introduce M3-Check, the first FCR dataset combining multilingual texts and images from social media posts with fact-check articles from diverse, credible sources. Furthermore, we introduce FACTOR a two-tower Transformer-based architecture that employs cross-tower parameter sharing and modality-wise aligned weight initialization; that outperforms zero-shot baselines, two-tower linear models, and vanilla Transformers, achieving a 17% improvement over the latter. Moreover we conduct modality ablations and compare state-of-the-art encoders, showing that multilingual encoders like multi-E5 can provide an additional 13% in performance without requiring English translations.
Loading