KEPIL: Knowledge-Enhanced Prompt-Image Learning for Prompt-Robust Disease Detection

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robustness, Medical Image Analysis, Human-AI alignment, Knowledge Injection, Vision Language Model.
Abstract: Vision–language models (VLMs) show promise for clinical decision support in ra- diology because they enable joint reasoning over radiological images and clinical text, thereby leveraging complementary clinical information. However, radiologi- cal findings are long-tailed in practice, leaving some conditions underrepresented and making zero-shot inference essential. Yet current CLIP-style medical VLMs are sensitive to prompt variations and often lack trustworthy external knowl- edge at inference time, which hinders reliable clinical deployment. We present KEPIL, a prompt-robust framework that integrates curated medical knowledge to stabilize zero-shot generalization. KEPIL comprises: (i) dynamic prompt en- richment using ontologies with LLM assistance, (ii) a semantic-aware contrastive loss aligning embeddings of equivalent prompt variants via a dual-embedding ob- jective, and (iii) entity-centric report standardization to yield ontology-aligned representations. Across seven benchmarks, KEPIL achieves state-of-the-art zero- shot/finetuning performance in classification and segmentation; under prompt- variation tests, it improves AUC by 6.37% on CheXpert and by 4.11% on average. Ablations and qualitative analyses validate the contributions of enriched prompts and semantic alignment, while attention maps highlight clinically relevant regions. These results show that structured knowledge and robust prompt design are key to clinically reliable radiology-facing VLMs. Code will be released at ***.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 6014
Loading