Combining Object-Based Attention and Attributes for Image Captioning

Cong Li, Jiansheng Chen, Weitao Wan, Tianpeng Li

2017 (modified: 10 Nov 2022)ICIG (1) 2017Readers: Everyone

Abstract: Image captioning has been a hot topic in computer vision and natural language processing. Recently, researchers have proposed many models for image captioning which can be classified into two classes: visual attention based models and semantic attributes based models. In this paper, we propose a novel image captioning system which models the relationship between semantic attributes and visual attention. Besides, different from the traditional attention models which don’t use object detectors and instead learn latent alignment between regions and words, we propose an object attention system which is capable to incorporate information output by object detectors and can better attend to objects when generating corresponding words. We evaluate our method on MS COCO dataset and our model outperforms many strong baselines.

0 Replies