Evaluating Open-Source Med-VLMs for Histopathologic Interpretation with Clinical Context

Amir Nazem; Xiao Li; Feng Yin; Shilin Zhao; Yang Ding; Haichun Yang; Yuankai Huo; Yaohong Wang

Evaluating Open-Source Med-VLMs for Histopathologic Interpretation with Clinical Context

Amir Nazem, Xiao Li, Feng Yin, Shilin Zhao, Yang Ding, Haichun Yang, Yuankai Huo, Yaohong Wang

Published: 09 May 2026, Last Modified: 12 May 2026MIDL 2026 - Short Papers PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Artificial intelligence (AI), foundation models, medical vision-language models

Registration Requirement: Yes

Abstract: The integration of artificial intelligence into pathology has driven research interest in evaluating medical vision-language models (Med-VLMs) for histopathologic interpretation. In this study, we compared two open-source Med-VLMs, MedGemma-4B-IT and Qwen2.5-VL-7B, for their ability to generate diagnostic interpretations from histopathology images with and without accompanying clinical context. A total of 100 cases were curated from the Pathology Outlines question bank and reviewed by two pathologists to generate controlled inputs, including multiple-choice differentials and removal of image-descriptive cues. Each model was evaluated across four experimental conditions with varying input modalities. Model outputs (diagnosis and reasoning) were scored by a pathologist on a 0–4 scale, and performance differences were analyzed using chi-square testing. Both models demonstrated significantly improved performance with the addition of clinical context and/or structured differential diagnoses ($\chi ^2$ $\textgreater$ 10, p $\textless$ 0.001). The greatest improvement was observed when multiple-choice differentials were provided (MedGemma: +45.9%; Qwen: +35.7%), while clinical context alone yielded more modest gains (MedGemma: +30.6%; Qwen: +24.5%). Qwen exhibited greater robustness to inconsistent clinical information, with smaller performance declines under conflicting inputs (11.2% vs. 23.5%). These findings highlight the importance of structured contextual inputs in enhancing diagnostic performance of Med-VLMs and support the potential of open-source models for privacy-preserving, locally deployable AI systems in pathology.

Visa & Travel: Yes

Read CFP & Author Instructions: Yes

Originality Policy: Yes

Single-blind & Not Under Review Elsewhere: Yes

LLM Policy: Yes

Submission Number: 32

Loading