RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance

Published: 27 Mar 2025, Last Modified: 30 Mar 2025MIDL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interactive Radiology Assistance, LVLMs, Report Generation, Chest X-Rays
TL;DR: We propose RaDialog, a collaborative radiology assistant focusing on automated report generation and auxiliary interactive downstream tasks for chest X-rays.
Abstract: Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method. Our model and dataset are publicly available (https://github.com/ChantalMP/RaDialog and https://physionet.org/content/radialog-instruct-dataset/1.1.0/).
Primary Subject Area: Foundation Models
Secondary Subject Area: Detection and Diagnosis
Paper Type: Methodological Development
Registration Requirement: Yes
Reproducibility: https://github.com/ChantalMP/RaDialog
Latex Code: zip
Copyright Form: pdf
Submission Number: 89
Loading