Multimodal metaphor recognition based on chain-of-cognition prompting

Dongyu Zhang, Xingyuan Lu, Mulin Zhuang, Senqi Yang, Hongjun Chen

Published: 2025, Last Modified: 23 Oct 2025Cogn. Syst. Res. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Metaphor is a way of thinking and cognition prevalent in human language. With the development of social media and multimodal data, metaphor recognition research has expanded from the traditional unimodal scope (such as text or images) to the multimodality. However, current multimodal metaphor processing methods mainly focus on fusion techniques for multiple modalities such as text and image, but neglect the cognitive mechanism of metaphor as a way of thinking, and are deficient in utilizing pre-trained information from large language models. Therefore, this paper proposes a chain-of-cognition prompting (CoC) method to address multimodal metaphor recognition task, which makes full use of the pre-training information of the large model in order to better recognize metaphors. The method utilizes prompting words to construct inputs that guide the large language model to reason about potential metaphorical source and target domain related entities and associations between entities in the sample. At the same time, visual information is obtained through image caption extraction and a visual encoder to enable the model to reason and produce metaphor recognition results. The experimental results show that the method performs well on the metaphor recognition task, which is better than the existing baseline model, verifying the effectiveness of the method on the metaphor recognition task.

External IDs:dblp:journals/cogsr/ZhangLZYC25