From Generalist to Specialist: Incorporating Domain-Knowledge into Flamingo for Chest X-Ray Report Generation
Keywords: Vision-language model, chest X-ray, report generation
Abstract: Automating the generation of accurate and reliable radiological reports from chest X-
ray images represents a significant challenge in medical image computing. In this context,
Vision-Language Models (VLMs), particularly the Flamingo architecture which achieves
state-of-the-art performance across various vision-language tasks, offers promising solu-
tions. This study evaluates the effectiveness of OpenFlamingo and its medical adaptation
MedFlamingo, a version further pre-trained on medical data, in generating radiological
reports. Our evaluation compares the zero-shot capabilities of OpenFlamingo and Med-
Flamingo against fine-tuning and training from scratch. Our results demonstrate that
fine-tuning consistently boosts model performance, with fine-tuned MedFlamingo outper-
forming its OpenFlamingo counterpart. Moreover, while training Flamingo from scratch
does not match the efficacy of fine-tuning, it nevertheless surpasses zero-shot performance.
This study underscores the potential of domain-specific fine-tuning in enhancing automated
radiological report generation, paving the way for more accurate and efficient diagnostic
workflows.
Submission Number: 94
Loading