Self-Improving Medical Visual Question Answering through Reasoning Trajectory Clustering

Halil Ibrahim Gulluk; Olivier Gevaert

Self-Improving Medical Visual Question Answering through Reasoning Trajectory Clustering

Halil Ibrahim Gulluk, Olivier Gevaert

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical visual question answering, Reasoning trajectory clustering, Self-improvement learning

TL;DR: We enhance medical VQA through COMCTS-generated reasoning annotations and a self-improvement framework that filters generated reasoning paths via DTW-based trajectory clustering.

Abstract: While large language models have shown promise in medical applications, their performance in medical visual question answering (VQA) remains limited by insufficient vision-language reasoning capabilities. We address this challenge through two complementary approaches. First, we generate high-quality reasoning annotations for existing medical VQA datasets using COMCTS algorithm. Second, we introduce a self-improvement framework that bootstraps model performance by learning from its own outputs, guided by a small set of high-quality reasoning samples. To optimize this self-improvement process, we propose a novel filtering mechanism based on reasoning trajectory K-medoids clustering, which employs Dynamic Time Warping (DTW) distances to select the most effective generated reasoning paths. Our comprehensive approach demonstrates significant improvements in medical VQA tasks. We release both the COMCTS-generated reasoning datasets and our code to support future research. Our code is available at https://anonymous.4open.science/r/SelfImproving-MedicalVQA-5507

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 9134

Loading