Attention-Based Medical Caption Generation with Image Modality Classification and Clinical Concept Mapping

Sadid A. Hasan, Yuan Ling, Joey Liu, Rithesh Sreenivasan, Shreya Anand, Tilak Raj Arora, Vivek V. Datla, Kathy Lee, Ashequl Qadir, Christine Leon Swisher, Oladimeji Farri

Published: 2017, Last Modified: 11 Oct 2025CLEF 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes an attention-based deep learning framework for caption generation from medical images. We also propose to utilize the same framework for clinical concept prediction to improve caption generation by formulating the task as a case of sequence-to-sequence learning. The predicted concept IDs are then mapped to corresponding terms in a clinical ontology to generate an image caption. We also investigate if learning to classify images based on the modality e.g. CT scan, MRI etc. can aid in generating precise captions.