Content-Specific Humorous Image Captioning Using Incongruity Resolution Chain-of-Thought

Anonymous

Content-Specific Humorous Image Captioning Using Incongruity Resolution Chain-of-Thought

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

Abstract: Although automated image captioning methods have benefited considerably from the development of large language models (LLMs), generating humorous captions is still a challenging task. Humorous captions generated by humans are unique to the image and reflect the content of the image. However, captions generated using previous captioning models tend to be generic. Therefore, we propose incongruity-resolution chain-of-thought (IRCoT) as a novel prompting framework that creates content-specific resolutions from fine details extracted from an image. Furthermore, we integrate logit bias and negative sampling to suppress the output of generic resolutions. The results of experiments with GPT4-V demonstrate that our proposed framework effectively generated humorous captions tailored to the content of specific input images.

Paper Type: long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English, Japanese

0 Replies

Loading