GuardHarMem and HarMDetect: a multimodal dataset and benchmlark model for fine-grained harmful meme classification

Published: 2025, Last Modified: 12 Jan 2026Soc. Netw. Anal. Min. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Harmful content on social media, especially in the form of memes, poses unique challenges for moderation and analysis. Memes combine text and imagery to convey complex messages rapidly and evoke emotional responses, enabling the dissemination of harmful ideas, reinforcement of stereotypes, and normalization of discriminatory behaviour, often eluding standard moderation tools. Existing datasets, such as Hateful Memes (\(\approx\)10 K samples) and Memotion (\(\approx\)8 K samples) focus on binary or coarse-grained labels and omit many common forms of harm. To address this gap, we introduce GuardHarMem, a new corpus of \(\approx\)16.6 K memes annotated with fine-grained categories including racism, mockery, and promotion of harmful substances. We also present HarMDetect a practical baseline multimodal classifier that integrates text, image, and automatically extracted captions. By applying targeted data augmentation strategies, we enhance model robustness on GuardHarMem, HarMDetect outperforms baseline transformers for both binary and multiclass classification. The dataset and code are publicly available at https://github.com/EL-Amrany/Harmful-memes-detection.
Loading