Vi-DRSNet: A Novel Hybrid Model for Vietnamese Image Captioning in Healthcare Domain

Doanh C. Bui, Nghia Hieu Nguyen, Nguyen D. Vo, Uyen Han Thuy Thai, Khang Nguyen

Published: 2022, Last Modified: 11 Apr 2025MAPR 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Image Captioning is an exciting topic that attracts the research community from both computer vision and natural language processing fields. In this paper, we present a novel hybrid model, which is an effective combination of three modules: Dual-level Collaborative, Meshed-memory Decoder and Adaptive Decoder. In detail, we use Dual-level Collaborative for integrating grid features and region features. Besides, Meshed-memory Decoder is also employed to take advantage of all encoder outputs. Finally, the idea of an Adaptive Decoder is applied for embedding the Vietnamese linguistic aspect into decoding steps. Our approach achieves competitive results compared to other methods on the public and private tests of the VieCap4H benchmark without using any data augmentation method.