Reinforcement Learning for Clinical Reasoning: Aligning LLMs with ACR Imaging Criteria

Reinforcement Learning for Clinical Reasoning: Aligning LLMs with ACR Imaging Criteria

ICLR 2026 Conference Submission13915 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Imaging, Reasoning, ACR Appropriateness Criteria, GRPO

TL;DR: This paper introduces MedReason-Embed, an RL-trained reasoning agent that outperforms larger models in imaging guideline adherence, enabling accurate, scalable, and trustworthy clinical decision support.

Abstract: Medical imaging has revolutionized diagnosis, yet unnecessary procedures are rising, exposing patients to radiation and stress, limiting equitable access, and straining healthcare systems. The American College of Radiology Appropriateness Criteria\textsuperscript{\tiny\textregistered}, developed through extensive multidisciplinary review, provide evidence-based guidance but remain underutilized. Leveraging advances in LLM reasoning, we introduce a Reasoning Agent trained with Reinforcement Learning (RL), specifically Group Relative Policy Optimization (GRPO), to replicate expert clinical reasoning from the ACR Criteria. We present a novel RL approach for structured medical reasoning, systematically comparing reasoning-focused reward functions and evidence integration strategies. Our lightweight 8B model, \textit{MedReason-Embed}, improves macro F1 by 18\% over baseline, shows stronger reasoning alignment, and outperforms both larger and alternatively trained models, showing that reasoning-based supervision enables efficient, trustworthy clinical AI. Building on this, we design a modular end-to-end agentic architecture that automates imaging referrals: mapping diagnoses to ICD codes, retrieving PubMed evidence, and recommending optimal procedures. Crucially, the ability to generalize beyond static ACR guidelines not only enables clinicians to handle out-of-distribution cases, but also supports scaling the guideline development process itself, potentially reducing the significant effort required to create and update them. This work shows the potential of reasoning-focused RL within agentic architectures to deliver transparent, scalable, and reliable clinical decision support. Our code is available at: \url{https://anonymous.4open.science/r/agentic-imaging-recommender-iclr-877D}

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 13915

Loading