Abstract: Online misinformation is often multimodal in nature, i.e., caused by misleading associations between texts and accompanying images. To support the fact-checking process, researchers have been recently developing automatic multimodal methods that gather and analyze external information, evidence, related to the image–text pairs under examination. However, prior works incorrectly assumed that all external information collected from the Web is relevant. In this study, we introduce a “relevant evidence detection” (RED) module to discern whether each piece of evidence is relevant, to support or refute the claim. Specifically, we develop the “relevant evidence detection directed transformer” (RED-DOT) and explore multiple architectural variants (e.g., single or dual-stage) and mechanisms (e.g., “guided attention”). Extensive ablation and comparative experiments demonstrate that RED-DOT outperforms the state-of-the-art (SotA), achieving up to 33.7% accuracy improvement on the VERITE benchmark. Furthermore, our evidence reranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPpings+ by up to 3% without the need for numerous evidence or multiple backbone encoders. We release our code at: https://github.com/stevejpapad/relevant-evidence-detection.
External IDs:dblp:journals/tcss/PapadopoulosKPP25
Loading