Abstract: Numerous deep learning (DL)-based approaches have been developed for medical report generation (MRG), aiming to automate the description of medical images. These reports typically comprise two sections: the findings, which describe visual aspects of the images, and the impression, which summarizes the diagnosis or assessment. Given the distinct abstraction levels of these sections, conventional end-to-end DL methods that generate both simultaneously may not be optimal. Addressing this challenge, we introduce a novel Hierarchical Medical Report Generation Network (Hi-MrGn) designed to better reflect the inherent structure of medical reports. The Hi-MrGn operates in two stages: initially, it generates the findings from input multimodal data including medical images and auxiliary diagnostic texts; subsequently, it produces the impression based on both the findings and images. To enhance the semantic coherence between findings and impression, we incorporate a contrastive learning module within the Hi-MrGn. We validate our approach using two public X-ray image datasets, MIMIC-CXR and IU-Xray, demonstrating that our method surpasses current state-of-the-art (SOTA) techniques in this domain.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: data-to-text generation,multimodal applications,healthcare applications
Contribution Types: Theory
Languages Studied: English
Keywords: medical report generation, multimodal fusion, hierarchical structure
Submission Number: 4914
Loading