Long-Tail Classification for Distinctive Image Captioning: A Simple yet Effective Remedy for Side Effects of Reinforcement LearningDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Distinctiveness is a desirable feature of image captions. Captions should cover the characteristic details of input images. However, recent high-performing captioning models that are trained with reinforcement learning (RL) tend to generate overly generic captions despite their high performance in various other criteria. Interestingly, it has also been reported that their outputs are composed of a limited number of common words and rarely contain tail-class words, i.e., low-frequency words in the training corpus. Vocabulary size is closely related to distinctiveness as it is difficult for a model to describe details beyond its vocabulary. Based on this insight, we hypothesize that the limited vocabulary of RL models is the major factor limiting their distinctiveness. We recast distinctive image captioning as a simpler task of long-tail classification to increase the vocabulary and then propose lightweight fine-tuning methods to encourage tail-class word generation. The experimental results demonstrate that our methods significantly enhance the distinctiveness of existing RL models as well as their vocabulary size, without sacrificing quality. Our methods also outperform previous distinctiveness-aware methods with a small computational cost of minor modifications to pre-trained RL models.
Paper Type: long
0 Replies

Loading