Do AI Systems Understand What We Mean? A Survey of Multimodal Human Creative Expression Understanding

Do AI Systems Understand What We Mean? A Survey of Multimodal Human Creative Expression Understanding

ACL ARR 2026 January Submission8180 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Human Creativity, Multimodal Large Language Models, Multimodal Reasoning, Meme, Humor, Advertising

Abstract: Multimodal human creativity, such as memes, cartoons, comics, advertisements, and humorous or satirical videos, presents unique challenges for AI systems due to its non-literal, culturally grounded, and rhetorically structured nature. While Multimodal Large Language Models (MLLMs) excel at physical-world understanding, they continue to struggle with creative communication. This survey provides a systematic overview of multimodal human creativity research. We first introduce a unified taxonomy that characterizes creative content along data forms, meaning-making mechanisms, and communicative goals. We then organize existing work into a capability-oriented hierarchy, spanning recognition, interpretation, and generation. Within this framework, we audit the architectural shifts that have shaped the transition from task-specific models to MLLMs. We conclude by identifying critical benchmark trends and open socio-technical challenges, providing a roadmap toward AI systems capable of sophisticated creative understanding.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: human behavior analysis, emotion detection and analysis, emoji prediction and analysis, human factors in NLP, multimodality,

Contribution Types: Surveys

Languages Studied: English

Submission Number: 8180

Loading