Keywords: Clinical SOAP Note Generation, Reinforcement Learning, GRPO, Reward Modeling
TL;DR: This work presents an evaluation-integrated reinforcement learning framework for long-form clinical text generation using Group Relative Policy Optimization (GRPO) with DocLens claim-level rewards.
Abstract: Automating clinical documentation with large language models requires precise
alignment with priorities such as completeness and factual grounding. We present
an evaluation-integrated reinforcement learning framework for long-form clini-
cal text generation that couples Group Relative Policy Optimization (GRPO) with
DocLens, a claim-level evaluator that provides deterministic, dialogue-grounded
rewards. Our method directly optimizes factual grounding and completeness with-
out training a separate reward model or relying on human-authored references.
Empirically, the approach improves clinical note quality and reduces training cost
via a simple reward-gating strategy. An independent GPT-5 qualitative evaluation
further supports these gains, showing higher preference for GRPO outputs in factuality, completeness, and brevity, with fewer omissions and hallucinations. Because the benchmarks are relatively clean and the base model already well aligned, these improvements likely represent a conservative lower bound. The framework
is scalable to real-world settings and can incorporate custom objectives such as
guideline adherence or billing preferences.
Primary Area: reinforcement learning
Submission Number: 20765
Loading