Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets

Jiho Shin

Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets

Jiho Shin

10 Oct 2025 (modified: 11 Oct 2025)EurIPS 2025 Workshop MedEurIPS SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: chest X-ray, foundation models, medical imaging, multimodal learning, embedding evaluation, clustering analysis, healthcare AI

TL;DR: We benchmark MedImageInsight and CXR-Foundation on MIMIC-CXR and NIH-CXR14, showing that compact embeddings yield more stable and transferable chest X-ray representations.

Abstract: Recent foundation models have demonstrated strong performance in medical image representation learning, yet their comparative behaviour across datasets remains underexplored. This work benchmarks two large-scale chest X-ray (CXR) embedding models (CXR-Foundation (ELIXR v2.0) and MedImageInsight) on public MIMIC-CXR and NIH ChestX-ray14 datasets. Each model was evaluated using a unified preprocessing pipeline and fixed downstream classifiers to ensure reproducible comparison. We extracted embeddings directly from pre-trained encoders, trained lightweight LightGBM classifiers on multiple disease labels, and reported mean AUROC and F1-score with 95% confidence intervals. MedImageInsight achieved slightly higher performance across most tasks, while CXR-Foundation exhibited strong cross-dataset stability. Unsupervised clustering of MedImageInsight embeddings further revealed a coherent disease-specific structure consistent with quantitative results. The results highlight the need for standardised evaluation of medical foundation models and establish reproducible baselines for future multimodal and clinical integration studies.

Submission Number: 25

Loading