Context-enhanced framework for medical image report generation using multimodal contexts

Published: 01 Jan 2025, Last Modified: 24 Jul 2025Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As deep learning technology continues to advance, including large language models and multimodal models, its application in the medical field has become a widely recognized research topic. In this context, a series of automated systems based on deep learning have been developed, aiming to generate corresponding text reports from medical images. However, these current methods often generate text reports solely based on patients’ images, overlooking the multimodal medical context, which encompasses various factors such as clinical information, diagnostic results, and medical knowledge. This limitation restricts the clinical application of automatically generated reports. To address this issue, we propose a novel Context-Enhanced Framework for medical image report generation. Our approach integrates various multimodal contextual elements, including but not limited to clinical text, medical knowledge, diagnostic results, and image data, to enrich the report generation process. We evaluated this framework on two public chest X-ray datasets, IU-Xray and MIMIC-CXR, using standard natural language generation and clinical effectiveness metrics. The results showed state-of-the-art performance, indicating improved quality in language and clinical accuracy. Our source code is available here.
Loading