Visual Language Transformers for Chest X-ray report generation with Longitudinal comparison

Published: 26 May 2026, Last Modified: 26 May 2026GDSS 2026EveryoneRevisionsCC BY 4.0
Keywords: Multimodal Learning, Vision-Language Models, Chest X-ray, Automated Report Generation, Longitudinal Analysis, Radiology
TL;DR: SuSufDoctor is a multimodal chest X-ray report generation system that integrates current and prior images with patient data, improving report accuracy and speed for resource-limited settings.
Abstract: Chest X-ray interpretation is a critical diagnostic task, yet radiologists in low-resource settings often face high workloads and long reporting times due to severe workforce shortages. Current automated report generation systems primarily rely on single-image analysis and cannot incorporate longitudinal comparisons or patient metadata, limiting their clinical usefulness. This project addresses these gaps by developing SuSufDoctor, an intelligent, multimodal chest X-ray report generation system powered by a fine-tuned SmolVLM-500M vision language transformer. The system integrates current and prior chest X-ray images, radiology reports, and patient metadata to produce comprehensive, structured reports covering Findings and Impression sections. A longitudinal multimodal dataset was constructed from the CheXpert-Plus dataset to train and evaluate the model effectively. Through LoRA-based fine-tuning and quantization techniques, the model achieved significant performance improvements, with BLEU increasing from 1.29% to 61.53%, ROUGE-L from 8.26% to 66.08%, and BERTScore (F1) from 80.48% to 93.92%. The system was deployed as a web application, enabling real-time inference and practical integration into radiologist workflows. Overall, the results demonstrate that SuSufDoctor can enhance diagnostic support, reduce reporting burden, and improve access to quality radiology services in resource-limited environments such as Ghana and Rwanda, where there are about 150 and 10 radiologists, respectively.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 11
Loading