RBBA: ResNet - BERT - Bahdanau Attention for Image Caption Generator

Published: 2023, Last Modified: 15 May 2025ICTC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, the topic of image caption generators has gained significant attention. Several successful projects have emerged in this field, showcasing notable advancements. Image caption generators automatically generate descriptive captions for images through the encoder and decoder mechanisms. The encoder leverages computer vision models, while the decoder utilizes natural language processing models. In this study, we aim to assess a comprehensive set of seven distinct methodologies, including six existing methods from prior research and one newly proposed. These methods are trained and evaluated with bilingual evaluation (BLEU) on the Flickr8K dataset. In our experiments, the proposed ResNet50 – BERT – Bahdanau Attention model outperforms the other models in terms of the BLEU-1 score of 0.532143 and BLEU-4 score of 0.126316.
Loading