OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

Halil Ibrahim Gulluk; Max Van Puyvelde; Olivier Gevaert

OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

Halil Ibrahim Gulluk, Max Van Puyvelde, Olivier Gevaert

Published: 09 May 2026, Last Modified: 16 May 2026MIDL 2026 - Short Papers PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Vision-Language Models, Medical Image Classification, Open Science

TL;DR: A 7B medical VLM pretrained on 14 open datasets achieves state-of-the-art PathVQA BLEU-1 against 80x larger models and transfers its vision encoder to 8 classification benchmarks better than BiomedCLIP, PMC-CLIP, and PubMedCLIP.

Registration Requirement: Yes

Abstract: We present OpenMedQ, a medical vision-language model pretrained on the broadest fully-open medical mix to date: 14 datasets totaling ∼3.35M pretraining samples spanning pathology, radiology, microscopy, and text-only clinical QA. OpenMedQ reaches state-of-the-art BLEU-1 on PathVQA (75.9), beating Med-PaLM M variants up to 562B parameters (∼80× larger), and matches the best reported VQA-MED BLEU-1 (64.5). Its vision encoder, transferred to 8 unseen medical classification benchmarks under an identical downstream recipe, obtains the highest average macro-F1 (0.757) among BiomedCLIP (0.745), PMC-CLIP (0.745), PubMedCLIP (0.746), and a from-scratch baseline (0.616). We will release the pretrained weights and complete dataset recipes upon acceptance; an interactive demo is already publicly available as a reproducible baseline for the community.

Visa & Travel: Yes

Read CFP & Author Instructions: Yes

Originality Policy: Yes

Single-blind & Not Under Review Elsewhere: Yes

LLM Policy: Yes

Submission Number: 120

Loading