Abstract: In recent years, the field of explicit semantic multimodal content research makes significant progress. However, research on content with implicit semantics, such as online memes, remains insufficient. Memes often convey implicit semantics through metaphors and may sometimes contain hateful information. To address this issue, researchers propose a task for detecting hateful memes, opening up new avenues for exploring implicit semantics. The hateful meme detection currently faces two main problems: 1) the rapid emergence of meme content makes continuous tracking and detection difficult; 2) current methods often lack interpretability, which limits the understanding and trust in the detection results. To make a better understanding of memes, we analyze the definition of metaphor from social science and identify the three key factors of metaphor: socio-cultural knowledge, metaphorical tenor, and metaphorical representation pattern. According to these key factors, we guide a multimodal large language model (MLLM) to infer the metaphors expressed in memes step by step. Particularly, we propose a hateful meme detection and interpretation framework, which has four modules. We first leverage a multimodal generative search method to obtain socio-cultural knowledge relevant to visual objects of memes. Then, we use socio-cultural knowledge to instruct the MLLM to assess the social-cultural relevance scores between visual objects and textual information, and identify the metaphorical tenor of memes. Meanwhile, we apply a representative interpretation method to provide representative cases of memes and analyze these cases to explore metaphorical representation pattern. Finally, a chain-of-thought prompt is constructed to integrate the output of the above modules, guiding the MLLM to accurately detect and interpret hateful memes. Our method achieves state-of-the-art performance on three hateful meme detection benchmarks and performs better than supervised training models on the hateful meme interpretation benchmark.
External IDs:dblp:journals/tcsv/WangWSTJL25
Loading