ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection
Keywords: Medical Alignment, Medical Benchmark, Preference Learning, Reward Modeling
Abstract: Aligning Large Language Models (LLMs) with high-stakes medical standards remains a significant challenge, primarily due to the dissonance between coarse-grained preference signals and the complex, multi-dimensional nature of clinical protocols. To bridge this gap, we introduce $\textit{ProMedical}$, a unified alignment framework grounded in fine-grained clinical criteria. We first construct $\textit{ProMedical-Preference-50k}$, a dataset generated via a human-in-the-loop pipeline that augments medical instructions with rigorous, physician-derived rubrics. Leveraging this corpus, we propose the Explicit Criteria Injection paradigm to train a multi-dimensional reward model. Unlike traditional scalar reward models, our approach explicitly disentangles safety constraints from general proficiency, enabling precise guidance during reinforcement learning. To rigorously validate this framework, we establish $\textit{ProMedical-Bench}$, a held-out evaluation suite anchored by double-blind expert adjudication. Empirical evaluations demonstrate that optimizing the $\texttt{Qwen3-8B}$ base model via $\textit{ProMedical-RM}$-guided GRPO yields substantial gains, improving overall accuracy by 22.3\% and safety compliance by 21.7\%, effectively rivaling proprietary frontier models. Furthermore, the aligned policy generalizes robustly to external benchmarks, demonstrating performance comparable to state-of-the-art models on UltraMedical. We publicly release our datasets, reward models, and benchmarks to facilitate reproducible research in safety-aware medical alignment.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Clinical and biomedical language models, medical question answering
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 3376
Loading