Clinically Interpretable Rule–Guided Preference Optimization in Vision–Language Models for Radiology Report Generation
Keywords: Radiology Report Generation, Medical AI, Preference Optimisation
Abstract: In modern healthcare, radiology plays a pivotal role in diagnosing and managing diseases. However, the complexity of medical image data combined with the variability of natural language generation often leads to inconsistencies, hallucinations, and a lack of clinical grounding, especially in automatically generated radiology reports. To address these challenges, we introduce a clinically interpretable rule-guided extension of direct preference optimization, tailored for radiology report generation. A typical radiology report comprises of findings and impression, findings capture the complex visual information from the medical image, for example a chest X-ray, and the impression is the implied conclusion. Our framework leverages on this phenomenon to design clinical rules from existing findings and impressions, that connect the finding and impression as a horn rule. The rules act as an additional, interpretable supervision signal, guiding the preference optimization of Vision–Language Models (VLM) toward outputs that are not only fluent but also clinically faithful. Unlike conventional preference optimization, which relies solely on lexical preferences, our approach enforces alignment with clinically meaningful predicates such as the presence, absence, or severity of key findings. A central feature of this framework is its ability to inject clinical rule guidance during optimization, ensuring that generated reports remain both linguistically coherent and clinically accurate. By integrating a neural verifier trained to evaluate rule satisfaction, our method provides an explicit mechanism for grounding preferences in interpretable clinical semantics via the clinical rules. Experimental results on benchmark datasets like MIMIC–CXR-JPG and IU–Xray, demonstrate that our approach substantially improves factual accuracy, and overall report quality compared to supervised fine-tuning and standard DPO baselines. We record a performance boost of 10% and 9% across lexical and semantic metrics. These results highlight the promise of clinically interpretable preference optimization as a pathway toward trustworthy and safe radiology report generation in medical AI.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 24634
Loading