Keywords: Artificial intelligence (AI), foundation models, medical vision-language models
Registration Requirement: Yes
Abstract: The integration of artificial intelligence into pathology has driven research interest in evaluating medical vision-language models (Med-VLMs) for histopathologic interpretation. In this study, we compared two open-source Med-VLMs, MedGemma-4B-IT and Qwen2.5-VL-7B, for their ability to generate diagnostic interpretations from histopathology images with and without accompanying clinical context. A total of 100 cases were curated from the Pathology Outlines question bank and reviewed by two pathologists to generate controlled inputs, including multiple-choice differentials and removal of image-descriptive cues. Each model was evaluated across four experimental conditions with varying input modalities. Model outputs (diagnosis and reasoning) were scored by a pathologist on a 0–4 scale, and performance differences were analyzed using chi-square testing. Both models demonstrated significantly improved performance with the addition of clinical context and/or structured differential diagnoses ($\chi ^2$ $\textgreater$ 10, p $\textless$ 0.001). The greatest improvement was observed when multiple-choice differentials were provided (MedGemma: +45.9%; Qwen: +35.7%), while clinical context alone yielded more modest gains (MedGemma: +30.6%; Qwen: +24.5%). Qwen exhibited greater robustness to inconsistent clinical information, with smaller performance declines under conflicting inputs (11.2% vs. 23.5%). These findings highlight the importance of structured contextual inputs in enhancing diagnostic performance of Med-VLMs and support the potential of open-source models for privacy-preserving, locally
deployable AI systems in pathology.
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 32
Loading