From Generalist to Specialist: Incorporating Domain-Knowledge into Flamingo for Chest X-Ray Report Generation

Raphael Stock; Stefan Denner; Yannick Kirchhoff; Constantin Ulrich; Maximilian Rouven Rokuss; Saikat Roy; Nico Disch; Klaus Maier-Hein

From Generalist to Specialist: Incorporating Domain-Knowledge into Flamingo for Chest X-Ray Report Generation

Raphael Stock, Stefan Denner, Yannick Kirchhoff, Constantin Ulrich, Maximilian Rouven Rokuss, Saikat Roy, Nico Disch, Klaus Maier-Hein

Published: 27 Apr 2024, Last Modified: 29 May 2024MIDL 2024 Short PapersEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-language model, chest X-ray, report generation

Abstract: Automating the generation of accurate and reliable radiological reports from chest X- ray images represents a significant challenge in medical image computing. In this context, Vision-Language Models (VLMs), particularly the Flamingo architecture which achieves state-of-the-art performance across various vision-language tasks, offers promising solu- tions. This study evaluates the effectiveness of OpenFlamingo and its medical adaptation MedFlamingo, a version further pre-trained on medical data, in generating radiological reports. Our evaluation compares the zero-shot capabilities of OpenFlamingo and Med- Flamingo against fine-tuning and training from scratch. Our results demonstrate that fine-tuning consistently boosts model performance, with fine-tuned MedFlamingo outper- forming its OpenFlamingo counterpart. Moreover, while training Flamingo from scratch does not match the efficacy of fine-tuning, it nevertheless surpasses zero-shot performance. This study underscores the potential of domain-specific fine-tuning in enhancing automated radiological report generation, paving the way for more accurate and efficient diagnostic workflows.

Submission Number: 94

Loading