Visual Prompt Engineering for Vision Language Models in Radiology

Stefan Denner; Markus Ralf Bujotzek; Dimitrios Bounias; David Zimmerer; Raphael Stock; Klaus Maier-Hein

Visual Prompt Engineering for Vision Language Models in Radiology

Stefan Denner, Markus Ralf Bujotzek, Dimitrios Bounias, David Zimmerer, Raphael Stock, Klaus Maier-Hein

Published: 27 Mar 2025, Last Modified: 30 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Models, Localized Classification, Zero-Shot Classification

TL;DR: We improve zero-shot chest X-ray classification by adding visual markers (arrows, boxes, circles) to guide vision-language models' attention.

Abstract: Medical image classification plays a crucial role in clinical decision-making, yet most models are constrained to a fixed set of predefined classes, limiting their adaptability to new conditions. Contrastive Language-Image Pretraining (CLIP) offers a promising solution by enabling zero-shot classification through multimodal large-scale pretraining. However, while CLIP effectively captures global image content, radiology requires a more localized focus on specific pathology regions to enhance both interpretability and diagnostic accuracy. To address this, we explore the potential of incorporating visual cues into zero-shot classification, embedding visual markers, such as arrows, bounding boxes, and circles, directly into radiological images to guide model attention. Evaluating across four public chest X-ray datasets, we demonstrate that visual markers improve AUROC by up to 0.185, highlighting their effectiveness in enhancing classification performance. Furthermore, attention map analysis confirms that visual cues help models focus on clinically relevant areas, leading to more interpretable predictions. To support further research, we use public datasets and provide our codebase and preprocessing pipeline, serving as a reference point for future work on localized classification in medical imaging.

Primary Subject Area: Application: Radiology

Secondary Subject Area: Detection and Diagnosis

Paper Type: Validation or Application

Registration Requirement: Yes

Reproducibility: https://github.com/MIC-DKFZ/VPE-in-Radiology

Visa & Travel: Yes

Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Created a single midl25_NNN.zip file with midl25_NNN.tex, midl25_NNN.bib, all necessary figures and files., Includes \documentclass{midl}, \jmlryear{2025}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package, Did not use the times package., All authors and co-authors are correctly listed with proper spelling and avoid Unicode characters., Author and institution details are de-anonymized where needed. All author names, affiliations, and paper title are correctly spelled and capitalized in the biography section., References must use the .bib file. Did not override the bibliographystyle defined in midl.cls. Did not use \begin{thebibliography} directly to insert references., Tables and figures do not overflow margins; avoid using \scalebox; used \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., Appendices and supplementary material are included in the same PDF after references., Main paper does not exceed 9 pages; acknowledgements, references, and appendix start on page 10 or later.

Latex Code: zip

Copyright Form: pdf

Submission Number: 80

Loading