MEDFACT-R1: TOWARDS FACTUAL MEDICAL REASONING VIA PSEUDO-LABEL AUGMENTATION

Glory Rongyu CHEN

Published: 30 Apr 2026, Last Modified: 28 Jan 2026ICASSP 2026EveryoneCC BY 4.0

Abstract: Ensuring factual consistency and reliable reasoning remains a critical challenge for medical vision-language models. We in- troduce MEDFACT-R1, a two-stage framework that integrates external knowledge grounding with reinforcement learning to improve the factual medical reasoning. The first stage uses pseudo-label supervised fine-tuning (SFT) to incorporate ex- ternal factual expertise; while the second stage applies Group Relative Policy Optimization (GRPO) with four tailored fac- tual reward signals to encourage self-consistent reasoning at deployment time without relying on external RAG. Across three public medical QA benchmarks, MEDFACT-R1 deliv- ers up to 22.5% absolute improvement in factual accuracy over previous state-of-the-art methods. Ablation studies highlight the necessity of pseudo-label SFT cold start and validate the contribution of each GRPO reward, underscoring the synergy between knowledge grounding and RL-driven reasoning for trustworthy medical AI. Codes are released at https://github.com/Garfieldgengliang/MEDFACT-R1.