Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

ACL ARR 2026 January Submission2343 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Benchmarking, Safety Alignment, Dark Humor, Implicit Reasoning, Arabic NLP, Computational Humor, Vision-Language Models.
Abstract: Dark humor exploits subtle cultural nuances and implicit cues, posing significant safety challenges that current static benchmarks fail to capture. To address this, we introduce a novel multimodal, multilingual benchmark for detecting and understanding harmful and offensive humor. Our manually curated dataset comprises 3,000 texts, 6,000 images, and 1,200 videos, spanning English, Arabic, and language-independent (universal) contexts. Unlike standard toxicity datasets, we enforce a strict annotation guideline: distinguishing \emph{Safe} jokes from \emph{Harmful} ones, with the latter further classified into \emph{Explicit} (overt) and \emph{Implicit} (Covert) categories to probe deep reasoning. We systematically evaluate state-of-the-art (SOTA) open and closed-source models across all modalities. Our findings reveal that closed-source models significantly outperform open-source ones, with a notable difference in performance between the English and Arabic languages in both, underscoring the critical need for culturally grounded, reasoning-aware safety alignment. \textcolor{red}{Warning: this paper contains example data that may be offensive, harmful, or biased.}
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: hate-speech detection, datasets for low resource languages, multilingual benchmarks, multimodal applications, safety and alignment
Contribution Types: Data resources
Languages Studied: English, Arabic
Submission Number: 2343
Loading