Keywords: Large language mode, Radiology Impression Generation, Alignment framework, Reinforcement learning
TL;DR: We present RGRO that explores the use of reinforcement learning to better align large language models with the intricate requirements of radiological practice by framing the report generation process as sequential decision-making stages.
Abstract: Large language models (LLMs) are typically specialized for domain tasks through supervised fine-tuning, which optimizes LLMs for likelihood-based objectives. While supervised fine-tuning enables LLMs to generate text that conforms to the language style of a specific domain, such as radiology, it often falls short in enhancing the model's ability to perform detailed diagnostic reasoning or tailor reports for individual patients. In this paper, we explore the use of reinforcement learning to better align LLMs with the intricate requirements of radiological practice. By framing the report generation process as sequential decision-making stages, we present Radiology-Guided Reinforcement Optimization (RGRO), a tailored policy optimization framework designed specifically for medical language tasks. RGRO moves beyond conventional likelihood-based training by directly optimizing for radiology-specific objectives, including consistency with radiology findings and adherence to established professional guidelines. Our empirical evaluations demonstrate that RGRO significantly enhances the diagnostic precision and clinical utility of radiology reports generated by LLMs, outperforming supervised fine-tuning methods and state-of-the-art models. Furthermore, RGRO enables the seamless integration of expert radiologist feedback and external diagnostic tools, all without the need for large-scale annotated datasets.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12918
Loading