Improving Image Captioning with Image Concepts of Words

Yiyu Wang, Xunzhi Xiang, Kun Jing, Jungang Xu, Yingfei Sun

Published: 2024, Last Modified: 01 Aug 2025KSEM (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Image captioning has made significant progress recently with encoder-decoder architectures, especially the introduction of attention mechanism and transformer structure. However, almost all works follow the same setting of the word embedding module, without considering the impact of word shape/glyph information. In this paper, we first generate a low-resolution glyph image for each word, and then propose a new word embedding module with Injecting Image Concept (IIC module) to extract image concept and embed the image concept into raw word embedding vector, which aims to enhance the expressive ability of word embedding in image captioning models. Sufficient experimental results validate the effectiveness and superiority of our approach.