QRad: Enhancing Radiology Report Generation by Captioning-to-VQA Reframing

Published: 12 Oct 2025, Last Modified: 13 Oct 2025GenAI4Health 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Radiology Report Generation, Image Captioning, Medical Imaging
Abstract: Radiology Report Generation using AI has demonstrated significant potential in modern clinical workflows. However, existing approaches have limited clinical utility due to a lack of interactive capabilities and compromised factual reliability because linguistic variations are prevalent in the training data and lead to overfitting. We introduce QRad, a novel approach which reframes radiology report generation from image captioning to a self-directed visual question-answering (VQA) process. Specifically, we convert radiology reports into question-answer pairs and train our model to first generate a chain of questions and then respond with answers. The answers are concatenated to form the radiology report. Our approach offers three advantages: First, quality is considerably improved (by 10.5% in RadGraph-F1) because linguistic variations (such as the omission or ordering of medical topics) is removed from the answer generation's criterion, allowing the model to focus on factual accuracy rather than presentation style. Second, the model provides an intrinsic VQA capability that enables physicians to interact with the model for details that may have been omitted in the initial output. Third, QRad derives confidence scores from token probabilities through its ability to answer template questions about specific medical conditions, a capability unavailable in previous models, enabling Receiver Operating Characteristic (ROC) based evaluation to facilitate regulatory approvals. Experiments show that QRad outperforms state-of-the-art models with only 13% of their size, offering a promising path for clinical adoption and regulatory validation in real-world settings.
Submission Number: 167
Loading