everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Large language models (LLMs) are typically specialized for domain tasks through supervised fine-tuning, which optimizes LLMs for likelihood-based objectives. While supervised fine-tuning enables LLMs to generate text that conforms to the language style of a specific domain, such as radiology, it often falls short in enhancing the model's ability to perform detailed diagnostic reasoning or tailor reports for individual patients. In this paper, we explore the use of reinforcement learning to better align LLMs with the intricate requirements of radiological practice. By framing the report generation process as sequential decision-making stages, we present Radiology-Guided Reinforcement Optimization (RGRO), a tailored policy optimization framework designed specifically for medical language tasks. RGRO moves beyond conventional likelihood-based training by directly optimizing for radiology-specific objectives, including consistency with radiology findings and adherence to established professional guidelines. Our empirical evaluations demonstrate that RGRO significantly enhances the diagnostic precision and clinical utility of radiology reports generated by LLMs, outperforming supervised fine-tuning methods and state-of-the-art models. Furthermore, RGRO enables the seamless integration of expert radiologist feedback and external diagnostic tools, all without the need for large-scale annotated datasets.