PunMemeCN: A Benchmark to Explore Vision-Language Models' Understanding of Chinese Pun Memes

PunMemeCN: A Benchmark to Explore Vision-Language Models' Understanding of Chinese Pun Memes

ACL ARR 2025 May Submission5494 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Pun memes, which combine wordplay with visual elements, represent a popular form of humor in Chinese online communications. Despite their prevalence, current Vision-Language Models (VLMs) lack systematic evaluation in understanding and applying these culturally-specific multimodal expressions. In this paper, we introduce PunMemeCN, a novel benchmark designed to assess VLMs' capabilities in processing Chinese pun memes across three progressive tasks: pun meme detection, sentiment analysis, and chat-driven meme response. PunMemeCN consists of 1,959 Chinese memes (653 pun memes and 1,306 non-pun memes) with comprehensive annotations of punchlines, sentiments, and explanations, alongside 2,008 multi-turn chat conversations incorporating these memes. Our experiments indicate that state-of-the-art VLMs struggle with Chinese pun memes, particularly with homophone wordplay, even with Chain-of-Thought prompting. Notably, punchlines in memes can effectively conceal potentially harmful content from AI detection. These findings underscore the challenges in cross-cultural multimodal understanding and highlight the need for culture-specific approaches to humor comprehension in AI systems.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, corpus creation, evaluation, reproducibility

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: Chinese

Keywords: benchmarking, corpus creation, evaluation, reproducibility

Submission Number: 5494

Loading