Abstract: Radiology report generation aims to automatically produce diagnostic reports from medical images, reducing radiologists’ workload. Most existing models commonly use an encoder-decoder architecture, where the text decoder generates reports based on encoded image tokens. However, these approaches have two major limitations: 1) they always use a single-view feature or simple static fusion multi-view feature, which fails to capture complementary information from multi-view images, and 2) they lack explicit diagnostic information related to the disease during the text decoding process, resulting in reduced clinical accuracy and relevance of the generated report. To deal with the above limitations, this paper proposes a novel framework employing Multi-view Feature Integration and Enhanced Disease Prompting for Radiology Report Generation, called MFDP. Specifically, MFDP introduces two key innovations:1) the Multi-view Feature Fusion (MFF) module is designed to dynamically integrate multi-view images (e.g., frontal and lateral views) through a multi-view attention mechanism that adaptively captures inter-view dependencies, enriching the decoder’s input features to generate more comprehensive reports. 2) the Enhanced Disease Prompting (EDP) module is designed to provide explicit diagnostic information by constructing enhanced disease prompts to guide the text decoding process. Experiments on two benchmark datasets, MIMIC-CXR and IU X-Ray, demonstrate that the proposed MFDP is competitive in both Clinical Efficacy (CE) and Natural Language Generation (NLG) metrics. Notably, MFDP achieves a 10% average improvement in CE Recall compared to SOTA models, enabling more precise localization of critical abnormalities while maintaining diagnostic completeness.
External IDs:dblp:journals/titb/ZhaoYWL26
Loading