From Pixels to Punchlines: Investigating Figurative Meaning in Memes with VLMs

ACL ARR 2025 May Submission6427 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Figurative language is central to humorous and persuasive communication. Internet memes, as a popular form of multimodal online communication, often use figurative elements to convey layered meaning through the combination of text and images, but little is known on what elements vision-language models (VLMs) utilize to detect non-literal meaning in memes. To address this gap, we evaluate nine state-of-the-art generative VLMs on their ability to detect and differentiate six types of non-literal meaning in memes. Our results show that VLMs outperform a majority-vote baseline, and, importantly, their accuracy improves as the figurative complexity of memes increases. Model performance across figurative categories varies by modality: identifying irony relies on text, while anthropomorphism on image. Although VLMs demonstrate competitive performance on single-modality inputs, they fail to fully integrate multimodal content. We thus highlight both the capabilities and limitations of today's VLMs in figurative meme understanding.
Paper Type: Short
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: figurative language, figurative meaning , internet memes, multimodality, vision-language models
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Keywords: figurative language, figurative meaning, internet memes, multimodality, vision-language models
Submission Number: 6427
Loading