ESCNet: Entity-enhanced and Stance Checking Network for Multi-modal Fact-Checking

Fanrui Zhang, Jiawei Liu, Jingyi Xie, Qiang Zhang, Yongchao Xu, Zheng-Jun Zha

Published: 2024, Last Modified: 13 May 2025WWW 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, misinformation incorporating both texts and images has been disseminated more effectively than those containing text alone on social media, raising significant concerns for multi-modal fact-checking. Existing research makes contributions to multi-modal feature extraction and interaction, but fails to fully enhance the valuable semantic representations or excavate the intricate entity information. Besides, existing multi-modal fact-checking datasets are primarily focused on English and merely concentrate on a single type of misinformation, thereby neglecting a comprehensive summary and coverage of various types of misinformation. Taking these factors into account, we construct the first large-scale Chinese Multi-modal Fact-Checking (CMFC) dataset which encompasses 46,000 claims. The CMFC covers all types of misinformation for fact-checking and is divided into two sub-datasets, Collected Chinese Multi-modal Fact-Checking (CCMF) and Synthetic Chinese Multi-modal Fact-Checking (SCMF). To establish baseline performance, we propose a novel Entity-enhanced and Stance Checking Network (ESCNet), which includes Multi-modal Feature Extraction Module, Stance Transformer, and Entity-enhanced Encoder. The ESCNet jointly models stance semantic reasoning features and knowledge-enhanced entity pair features, in order to simultaneously learn effective semantic-level and knowledge-level claim representations. Our work offers the first step and establishes a benchmark for evidence-based, multi-type, multi-modal fact-checking.