Abstract: Ensuring factual consistency and reliable reasoning remains a
critical challenge for medical vision-language models. We in-
troduce MEDFACT-R1, a two-stage framework that integrates
external knowledge grounding with reinforcement learning
to improve the factual medical reasoning. The first stage uses
pseudo-label supervised fine-tuning (SFT) to incorporate ex-
ternal factual expertise; while the second stage applies Group
Relative Policy Optimization (GRPO) with four tailored fac-
tual reward signals to encourage self-consistent reasoning at
deployment time without relying on external RAG. Across
three public medical QA benchmarks, MEDFACT-R1 deliv-
ers up to 22.5% absolute improvement in factual accuracy
over previous state-of-the-art methods. Ablation studies
highlight the necessity of pseudo-label SFT cold start and
validate the contribution of each GRPO reward, underscoring
the synergy between knowledge grounding and RL-driven
reasoning for trustworthy medical AI. Codes are released at
https://github.com/Garfieldgengliang/MEDFACT-R1.
Loading