EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography

04 Sept 2025 (modified: 26 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Echocardiography, vision-language model, ultrasound
Abstract: Echocardiography is the most widely used imaging modality in cardiology, yet its interpretation remains labor-intensive and inherently multimodal, which requires view recognition, quantitative measurements, qualitative assessments, and guideline-based reasoning. While recent vision–language models (VLMs) have achieved broad success in natural images and certain medical domains, their potential in echocardiography has been limited by the lack of large-scale, clinically grounded image–text datasets and the absence of measurement-based reasoning central to echo interpretation. We introduce EchoGround-MIMIC, the first measurement-grounded multimodal echocardiography dataset, comprising 19,065 image–text pairs from 1,572 patients with standardized views, structured measurements, measurement-grounded captions, and guideline-derived disease labels. Building on this resource, we propose EchoVLM, a vision–language model that incorporates two novel pretraining objectives: (i) a view-informed contrastive loss that encodes the view-dependent structure of echocardiographic imaging, and (ii) a negation-aware contrastive loss that distinguishes clinically critical negative from positive findings. Across five types of clinical applications with 36 tasks spanning multimodal disease classification, image–text retrieval, view classification, chamber segmentation, and landmark detection, EchoVLM achieves state-of-the-art performance (86.5\% AUC in zero-shot disease classification and 95.1\% accuracy in view classification). We demonstrate that clinically grounded multimodal pretraining yields transferable visual representations and establish EchoVLM as foundation model for end-to-end echocardiography interpretation. We will release EchoGround-MIMIC and data curation code, enabling reproducibility and further research in multimodal echocardiography interpretation.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 1886
Loading