TRRG: Towards Truthful Radiology Report Generation With Cross-Modal Disease Clue Enhanced Large Language Models

Published: 2025, Last Modified: 22 Jan 2026MICCAI (7) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The vision-language capabilities of multi-modal large language models have gained attention, but radiology report generation still faces challenges due to imbalanced data distribution and weak alignment between reports and radiographs. To address these issue, we propose TRRG, a stage-wise training framework for truthful radiology report generation. In the pre-training stage, contrastive learning enhances the visual encoder’s ability to capture fine-grained disease details. In the fine-tuning stage, our clue injection module improves disease perception by integrating robust zero-shot disease recognition. Finally, the cross-modal clue interaction module enables effective multi-granular fusion of visual and disease clue embeddings, significantly improving report generation and clinical effectiveness. Experiments on IU-Xray and MIMIC-CXR show that TRRG achieves state-of-the-art performance, enhancing disease perception and clinical utility.
Loading