CHARMemes: Collection of Harmful Runet Memes

CHARMemes: Collection of Harmful Runet Memes

ACL ARR 2026 January Submission5382 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Harmful Content Detection, Meme Understanding

Abstract: Harmful memes pose a distinct challenge for content moderation: they compress social meaning into a visual template with short text, often relying on irony, intertextuality, and local cultural context. Current multimodal safety datasets either focus on narrow harm definitions or mainly English meme ecosystems, limiting progress on robust, culturally grounded detection. We introduce CHARMemes, a large-scale benchmark for harm-aware meme understanding in the Russian-oriented online ecosystem (Runet). The dataset contains 23,025 in-the-wild memes collected from a template-centric public source and spans 2007–2025, enabling temporal and cross-domain analyses. We propose a taxonomy of 11 harm categories grouped into three severity buckets, and build the dataset with a scalable pipeline combining automated labeling, targeted human review, and multi-stage deduplication. We benchmark multiple vision–language approaches under binary, severity-level, and fine-grained settings, and evaluate robustness across time and datasets. Results indicate that binary harmfulness detection is comparatively strong, while fine-grained harm recognition remains difficult, especially under temporal shift. CHARMemes offers a realistic testbed for developing more robust harmful meme detectors.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, multilingual corpora

Contribution Types: Data resources

Languages Studied: Russian, English

Submission Number: 5382

Loading