Structural Bottleneck Reasoning: Efficient Medical VQA via Concept Alignment

Published: 28 May 2026, Last Modified: 28 May 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: medical visual chain-of-thought, XAI for healthcare, concept alignment
Abstract: The adoption of a medical visual Chain-of-Thought (CoT) is fundamental to establishing trustworthy predictions; by externalizing the model's internal logic, it allows clinicians to verify that a diagnosis is derived from valid clinical indicators rather than spurious correlations. However, the expert-driven CoT paradigm is often bottlenecked by the prohibitive cost of dense manual annotations. To resolve this, we introduce \textsc{StructAlign}, a framework that enables the seamless integration of expert knowledge via a predefined list of clinical concepts. Rather than requiring expensive, sentence-level supervision, our method leverages these concepts as \textit{latent causal mediators} to anchor the model’s reasoning. This prevents \textit{reasoning collapse} and ensures diagnostic consistency, providing a scalable, cost-effective path toward interpretable AI in high-stakes medicine. Our results demonstrate that StructAlign maintains high performance even in data-scarce regimes (10\% and 40\%), achieving large-margin improvements of up to +24.64\%, demonstrating that conceptual grounding can effectively restore high-fidelity diagnostic reasoning, thereby reducing reliance on exhaustive expert supervision.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 123
Loading