Abstract: Automatic Report Generation(ARG), which aims to automatically provide observations on images, is challenged by the lack of coherence between multiple scenes and precise description of multiple lesions. In order to explore the task of multi-scene multi-lesion report generation(MSMLRG) in one shot, in this paper, we introduce a multi-scene multi-lesion report generation framework to extract the scene-report alignment relation and scene-topic relation. Specifically, we design a scene-report feature aligner to achieve fine-grained alignment of different lesion in different scenes in images and reports, and incorporate a topic-aware module to help generate a topic text vocabulary for different scenes. Our framework has been successfully experimented on several automatic report generation models, and performs well on automatic evaluation metrics. The framework for one-shot report generation during multi-scene not only fills the gap of multi-scene image report generation, but also effectively improves the accuracy and consistency of diagnostic reports.
External IDs:dblp:conf/icassp/YuanKHZZL25
Loading