SCARE: A Novel Framework to Enhance Chinese Harmful Memes Detection

Tianlong Gu, Mingfeng Feng, Xuan Feng, Xuemin Wang

Published: 01 Jan 2025, Last Modified: 08 Oct 2025IEEE Trans. Affect. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Harmful meme detection presents a significant multimodal challenge that necessitates contextual background knowledge and comprehensive inference. Although some research studies have been related to harmful meme detection in English, detecting harmful memes in Chinese is also an unresolved issue. In this paper, to bridge this gap, we constructed a Chinese harmful meme detection dataset, named CHMEMES. Furthermore, existing multimodal alignment methods have shown poor performance in tasks involving harmful meme detection, where there is a mismatch between the image and text components. To improve the task, we propose a multimodal framework Semantic Contrastive Alignment fRamEwork (SCARE), which enables fully representing both cross-modal and intra-modal information. For cross-modal information, we introduce a cross-modal contrast alignment objective to maximize the mutual information between image and text. For intra-modal information, we design a new intra-modal contrast objective to achieve more robust visual and textual representation learning. Moreover, we present a simple yet efficient vision prompt tuning paradigm for parameter-efficient harmful meme detection. We conduct extensive experiments on the constructed Chinese dataset and the existing English dataset. Experimental results show that our method outperforms state-of-the-art baselines in harmful meme detection.