Keywords: Large language mode, Radiology Impression Generation, Alignment framework, Reinforcement learning
TL;DR: We present RGRO that explores the use of reinforcement learning to better align large language models with the intricate requirements of radiological practice by framing the report generation process as sequential decision-making stages.
Abstract: Large language models (LLMs) are typically specialized for domain tasks through supervised fine-tuning, which optimizes LLMs for likelihood-based objectives. While supervised fine-tuning enables LLMs to generate text that conforms to the language style of a specific domain, such as radiology, it often falls short in enhancing the model's ability to perform detailed diagnostic reasoning or tailor reports for individual patients. In this paper, we explore the use of reinforcement learning to better align LLMs with the intricate requirements of radiological practice. By framing the report generation process as sequential decision-making stages, we present Radiology-Guided Reinforcement Optimization (RGRO), a tailored policy optimization framework designed specifically for medical language tasks. RGRO moves beyond conventional likelihood-based training by directly optimizing for radiology-specific objectives, including consistency with radiology findings and adherence to established professional guidelines. Our empirical evaluations demonstrate that RGRO significantly enhances the diagnostic precision and clinical utility of radiology reports generated by LLMs, outperforming supervised fine-tuning methods and state-of-the-art models. Furthermore, RGRO enables the seamless integration of expert radiologist feedback and external diagnostic tools, all without the need for large-scale annotated datasets.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12918
Loading