Dual-Route Mental Imagery for Robust VLM-based Medical Image Diagnosis

Dual-Route Mental Imagery for Robust VLM-based Medical Image Diagnosis

ICLR 2026 Conference Submission15641 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Models, Medical Image Diagnosis, Mental Imagery, Chest X-ray, Cognitive-inspired AI

TL;DR: We propose Dual-Route Mental Imagery, a prototype-conditioned reasoning framework that enhances VLM-based chest X-ray diagnosis with greater robustness and accuracy, achieving up to 95.9% accuracy and rivaling expert-designed networks.

Abstract: Despite rapid progress of large vision-language models (VLMs), their diagnostic predictions in medical imaging remain brittle and often clinically inconsistent. Inspired by how radiologists rely on prototype-based mental imagery, we propose Dual-Route Mental Imagery, the first framework that formalizes prototype-conditioned reasoning for VLMs. Our method conditions diagnosis on (patient, prototype) pairs, instantiating two complementary reasoning routes—healthy and diseased—that yield interpretable reference-level traces and expose uncertainty when the two routes conflict. On chest X-ray benchmarks, our approach delivers substantial gains: on the Kermany dataset, it achieves 92.6% accuracy, on par with the expert-designed network LungConVT-Net, and further improves to 95.9% with uncertainty handling, while substantially outperforming single-image VLM inference. These results demonstrate that prototype-guided dual-route mental imagery not only enhances the robustness and accuracy of VLM-based diagnosis, but also provides a novel bridge between cognitive science and AI for healthcare.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 15641

Loading