Abstract: The complexity of medical image data combined with the variability of natural language generation often leads to inconsistencies, hallucinations, and a lack of clinical grounding, especially in automatically generated radiology reports. To address these challenges, we introduce a task-specific symbolic constraints preference optimization technique, tailored for radiology report generation. A typical radiology report comprises of findings and impression; findings capture the complex visual information from the medical image, for example a chest X-ray, and the impression is the implied conclusion. Our framework leverages on this phenomenon to design clinical rules from existing findings and impressions, that connect the finding and impression as a horn rule. The rules act as an additional, interpretable supervision signal, guiding the preference optimization of Vision–Language Models (VLM) toward outputs that are fluent as well as clinically coherent. Unlike conventional preference optimization, which relies solely on lexical preferences, our approach enforces alignment with clinically meaningful predicates such as the presence, absence, or severity of key findings. A central feature of this framework is its ability to inject symbolic constraint guidance during optimization, ensuring that generated reports remain both linguistically fluent and clinically coherent. Experimental results on benchmark datasets like MIMIC–CXR-JPG and IU–Xray, demonstrate that our approach substantially improves factual accuracy, and overall report quality compared to zero-shot and standard DPO baselines. We record a significant performance boost across lexical and semantic metrics. These results highlight the promise of clinically interpretable preference optimization as a pathway toward trustworthy radiology report generation in medical AI.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Lei_Wang13
Submission Number: 8643
Loading