Abstract: In this paper, we propose a Weighted Cross-modal Aggregation network (WCAN) for rumor detection in order to combine highly correlated features in different modalities and obtain a unified representation in the same space. WCAN exploits an adversarial training method to add perturbations to text features to enhance model robustness. Specifically, we devise a weighted cross-modal aggregation (WCA) module that measures the distance between text, image and social graph modality distributions using KL divergence, which leverages correlations between modalities. By using MSE loss, the fusion features are progressively closer to the original features of the image and social graph while taking into account all of the information from each modality. In addition, WCAN includes a feature fusion module that uses dual-modal co-attention blocks to dynamically adjust features from three modalities. Experiments are conducted on two datasets, WEIBO and PHEME, and the experimental results demonstrate the superior performance of the proposed method.
Loading