Keywords: Medical Visual Question Answering, Medical Reasoning, Principle-Based Iterative Learning
Abstract: Effective reasoning over complex visual data and medical knowledge is critical for medical Visual Question Answering (VQA). While multimodal large language models (MLLMs) show promise, their reasoning capabilities remain fundamentally capped by the static nature of current training paradigms. Existing reinforcement learning (RL) methods act as fixed tutors, providing unchanging guidance that often optimizes output format without explicit medical expertise, leading to performance plateaus and reward hacking. Drawing inspiration from how human experts continuously refine clinical principles, we introduce \textbf{Evo-PI}, a framework that operationalizes a synergistic loop of evolving principle-guided learning. Evo-PI generates, applies, and iteratively refines abstract medical principles, which serve as dynamic rewards. This co-evolution of the reasoning model and its guiding principles enables MLLMs to develop more robust and clinically aligned reasoning. Across eight medical VQA benchmarks, Evo-PI consistently improves performance over diverse backbones and RL algorithms, achieving up to 24.6\% accuracy gains. Our results establish evolving principle scaling as a scalable and generalizable paradigm for aligning MLLMs with expert-like reasoning, advancing the path toward trustworthy medical AI.
Primary Area: reinforcement learning
Submission Number: 17973
Loading