S-Chain: Structured Visual Chain-of-Thought for Medicine

Khai Le-Duc; Phuong T.H. Trinh; Duy Minh Ho Nguyen; Tien-Phat Nguyen; Nghiem Tuong Diep; An Ngo; Tung Vu; Trinh Vuong; Anh-Tien Nguyen; Nguyen Dinh Mau; Van Trung Hoang; Khai-Nguyen Nguyen; Hy Nguyen; Chris Ngo; Anji Liu; Nhat Ho; Anne-Christin Hauschild; Khanh Xuan Nguyen; Thanh Nguyen-Tang; Pengtao Xie; Daniel Sonntag; James Zou; Mathias Niepert; Anh Totti Nguyen

S-Chain: Structured Visual Chain-of-Thought for Medicine

20 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: visual chain-of-thought, medical MLLMS

TL;DR: A new, large-scale expert-annotated dataset for medical visual question answering

Abstract: Faithful reasoning in medical vision–language models (VLMs) requires not only accurate predictions but also transparent alignment between textual rationales and visual evidence. While Chain-of-Thought (CoT) prompting has shown promise in medical visual question answering (VQA), no large-scale expert-level dataset has captured stepwise reasoning with precise visual grounding. We introduce \textbf{S-Chain}, the first large-scale dataset of 12,000 expert-annotated medical images with bounding boxes and structured visual CoT (SV-CoT), explicitly linking visual regions to reasoning steps. The dataset further supports 16 languages, totaling over 700k VQA pairs for broad multilingual applicability. Using S-Chain, we benchmark state-of-the-art medical VLMs (ExGra-Med, LLaVA-Med) and general-purpose VLMs (Qwen2.5-VL, InternVL2.5), showing that SV-CoT supervision significantly improves interpretability, grounding fidelity, and robustness. Beyond benchmarking, we study its synergy with retrieval-augmented generation, revealing how domain knowledge and visual grounding interact during autoregressive reasoning. Finally, we propose a new mechanism that strengthens the alignment between visual evidence and reasoning, improving both reliability and efficiency. S-Chain establishes a new benchmark for grounded medical reasoning and paves the way toward more trustworthy and explainable medical VLMs.

Primary Area: datasets and benchmarks

Submission Number: 23239

Loading