PathFinder: Graph-structured Reasoning for Medical Visual Question Answering

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical VLM, Reasoning-based Model, Reinforcement Learning
Abstract: Medical Visual Question Answering (MVQA) aims not only to predict correct diagnoses but also to provide explicit, clinically-grounded reasoning to enhance interpretability, to foster clinician trust, and to support AI-assisted decision-making. Despite the recent advances, the explanations of existing MVQA are often incomplete and non-causal, neglecting key evidence, intermediate steps, and alternative hypotheses. In this paper, we present PathFinder, a graph-structured reasoning framework, in which medical entities are represented as nodes and causal/evidential relations as edges, enabling systematic traversal of diagnostic pathways. In PathFinder, we define two structural reasoning dimensions: step-wise exploration, which encourages PathFinder to traverse intermediate entities with causal links, and branch-wise exploration, which encourages exploring alternative diagnostic routes and ruling out unlikely options.Further, we introduce Graph-GRPO to integrate graph-structured supervision with two process-level rewards: a Step Reward for causally coherent reasoning, and a Branch Reward for systematic exploration of alternatives, complemented by outcome accuracy. Experiments on seven multimodal and seven text-only benchmarks consistently show that PathFinder outperforms state-of-the-art methods, while producing reasoning results that are causally coherent and structurally comprehensive. Codes will be released.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5421
Loading