MEDSAGE: Learning Structured Medical Visual Reasoning via Self-Corrective Reinforcement Learning

MEDSAGE: Learning Structured Medical Visual Reasoning via Self-Corrective Reinforcement Learning

ACL ARR 2026 January Submission5196 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Vision-Language Models; Structured Visual Reasoning; Reinforcement Learning; Self-Corrective Training

Abstract: Reinforcement learning (RL) can improve interpretability in medical vision-language models (VLMs), but medical visual reasoning remains challenging without structured guidance. Existing supervised fine-tuning and reinforcement learning (SFT+RL) approaches often learn task-specific image-to-answer mappings, leading to misalignment between visual evidence and textual reasoning and resulting in shortcut reasoning. To address the above challenges, we propose MEDSAGE, a medical VLMs framework built upon structured reasoning sequences. MEDSAGE introduces a structured path enhancement strategy that formulates medical visual reasoning as a sequence of clinically meaningful stages—localization, visual analysis, knowledge matching, and final decision—thereby guiding models to explore reasonable reasoning paths. We construct two training datasets, \textbf{SAGE-sft20K} and \textbf{SAGE-rl10K}, to support this training paradigm. Within this framework, SFT induces consistent structured reasoning across tasks, while self-corrective RL further improves answer correctness by enabling the model to revise erroneous predictions during training. encouraging self-check guided correction of erroneous predictions. Experiments on five medical benchmark datasets show that MEDSAGE achieves competitive or improved performance across diverse medical VQA benchmarks. Additional analyses further examine robustness and reasoning faithfulness.

Paper Type: Long

Research Area: Clinical and Biomedical Applications

Research Area Keywords: Medical Vision-Language Models; Structured Visual Reasoning; Reinforcement Learning; Self-Corrective Training

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 5196

Loading