Learning Compact Reward for Image Captioning

Nannan Li; Zhenzhong Chen

Learning Compact Reward for Image Captioning

Nannan Li, Zhenzhong Chen

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: image captioning, adversarial learning, inverse reinforcement learning, vision, language

TL;DR: a refiened AIRL algorithm that learns compact reward for image captioning

Abstract: Adversarial learning has shown its advances in generating natural and diverse descriptions in image captioning. However, the learned reward of existing adversarial methods is vague and ill-defined due to the reward ambiguity problem. In this paper, we propose a refined Adversarial Inverse Reinforcement Learning (rAIRL) method to handle the reward ambiguity problem by disentangling reward for each word in a sentence, as well as achieve stable adversarial training by refining the loss function to shift the stationary point towards Nash equilibrium. In addition, we introduce a conditional term in the loss function to mitigate mode collapse and to increase the diversity of the generated descriptions. Our experiments on MS COCO show that our method can learn compact reward for image captioning.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/learning-compact-reward-for-image-captioning/code)

Original Pdf: pdf

9 Replies

Loading