Intelligent Image Captioning with Several Language Models

Yue Chen, Yingnan Ju, Kenneth Steimel

08 Sept 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: This project is an attempt to provide captions to images in which common objects are in context. We propose to use deep neural network to localize and recognize objects within an image and then produce a caption in natural speech. Often, image captioning is confined to a plain recognition task with multiple recognized objects per image. However, generating an actual caption is a more difficult, and potentially more interesting, problem. While most modern systems are very accurate at labeling the objects in an image, labeling the actions in an image is a much more challenging task. We will incorporate deep image recognition to generate a simple sentence expressing the action in the image. Our main contribution over previous research in Image Captioning comes in the correction of the decoded text representation. We will use language models based upon templating with regard for semantic similarity and graph-based dependency parses to fix errors that occur during generation.

0 Replies