An Effective Multimodal Rumor Detection Model via Image Semantic Enhancement and Hierarchical Fusion

Published: 2025, Last Modified: 21 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The rapid development of social platforms and multimedia technologies has led to the proliferation of multimodal rumors on these platforms. Therefore, automatic multimodal rumor detection has received extensive attention from researchers. Although many existing methods exhibit strong capabilities to identify multimodal rumors, they still have shortcomings in fully utilizing the information contained in images and performing fine-grained fusion of features from different modalities. In this paper, we propose an effective model via Image Semantic Enhancement and Hierarchical Fusion (ISEHF) for multimodal rumor detection. Specifically, ISEHF performs image semantic enhancement by employing a large vision-language model to obtain image captions, which is critical to fully utilizing the information contained in images. Moreover, we propose a hierarchical fusion module within the ISEHF model, which consists of a shallow fusion module and a deep fusion module, to generate richer features at different levels for fine-grained fusion of features from different modalities. Extensive experiments conducted on two public datasets demonstrate the superiority of our model in comparison with the state-of-the-art baselines.
Loading