Enhancing Multimodal Rumor Detection with Statistical Image Features and Modal Alignment via Contrastive Learning
Abstract: The swift proliferation of multimodal rumors on social media, particularly those with manipulated images and complex intermodal interactions, significantly challenges current detection methods. In response, we utilize statistical image features, including mean and variance, to capture spatial attributes effectively and improve the detection of image-tampered tweets. To tackle complex intermodal correlations, we introduce a contrastive learning approach that aligns features across modalities efficiently. Additionally, we introduce a cross-attention fusion module (CAFM) that enhances the integration of image and text modalities, thereby improving multimodal rumor detection performance. In conclusion, we propose the cross-attention fusion network (ConCAFN), leveraging contrastive learning for robust multimodal rumor detection. Extensive experiments on two real-world datasets confirm the model’s enhanced capability to detect multimodal rumors accurately, demonstrating our methods’ effectiveness.
Loading