Keywords: multimodal foundation model, bias, zero-shot, pneumonia, ensembles
TL;DR: Presenting ways to deploy foundation models with optimized text and vision prompts for superior and equitable clinical applications across diverse populations.
Abstract: Foundation models (FMs) have shown impressive performance in medical image analysis tasks, but their deployment in real-world clinical settings, especially across diverse patient populations such as adult and pediatric cases, remains challenging. Key open questions include optimal prompting techniques and strategies for model adaptation or fine-tuning for clinical use. In this study, we evaluated different approaches for deploying FMs in clinical scenarios for diverse patient populations. We use the lightweight, embedding-based vision-language FM $\textit{MedImageInsight}$ to predict pneumonia from chest X-rays, a condition common in both adult and pediatric patients.
We observed a large variation in model predictive performance depending on the chosen prompt design, highlighting the importance of text prompt design for successful zero-shot (ZS) application. On in-domain datasets, we found performance differences of up to 46% in Matthews correlation coefficient (MCC) and 56% in true positive rates across different text prompts.
By introducing text and vision embedding ensembles, we achieved substantial ZS improvements, outperforming training-based methods (fine-tuning, Linear Probe) in low-data scenarios by up to 43% for adults and 35% for pediatric populations (MCC). This ensembling strategy also promotes resource-efficient, equitable clinical use by supporting diverse demographic subgroups, achieving MCC improvements of 6% by sex, 17% by age, and 10% by race compared to linear probe.
Primary Subject Area: Foundation Models
Secondary Subject Area: Transfer Learning and Domain Adaptation
Paper Type: Both
Registration Requirement: Yes
Reproducibility: https://github.com/loufay/Beyond-the-prompt/tree/main
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Created a single midl25_NNN.zip file with midl25_NNN.tex, midl25_NNN.bib, all necessary figures and files., Includes \documentclass{midl}, \jmlryear{2025}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package, Did not use the times package., All authors and co-authors are correctly listed with proper spelling and avoid Unicode characters., Author and institution details are de-anonymized where needed. All author names, affiliations, and paper title are correctly spelled and capitalized in the biography section., References must use the .bib file. Did not override the bibliographystyle defined in midl.cls. Did not use \begin{thebibliography} directly to insert references., Tables and figures do not overflow margins; avoid using \scalebox; used \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., Appendices and supplementary material are included in the same PDF after references., Main paper does not exceed 9 pages; acknowledgements, references, and appendix start on page 10 or later.
Latex Code: zip
Copyright Form: pdf
Submission Number: 78
Loading