# MedVLP Zero-Shot Sensitivity Analysis
The official code base for [**How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?**]

## Abstract

Recent advancements in medical vision-language pre-training (MedVLP) have significantly enhanced zero-shot medical vision tasks such as image classification by leveraging large-scale medical image-text pair pre-training. However, the performance of these tasks can be heavily influenced by the variability in textual prompts describing the categories, necessitating robustness in MedVLP models to diverse prompt styles. Yet, this sensitivity remains underexplored. In this work, we are the first to systematically assess the sensitivity of three widely-used MedVLP methods to a variety of prompts across 15 different diseases. To achieve this, we designed six unique prompt styles to mirror real clinical scenarios, which were subsequently ranked by interpretability. Our findings indicate that all MedVLP models evaluated show unstable performance across different prompt styles, suggesting a lack of robustness. Additionally, the models' performance varied with increasing prompt interpretability, revealing difficulties in comprehending complex medical concepts. This study underscores the need for further development in MedVLP methodologies to enhance their robustness to diverse zero-shot prompts.

## Environment Setup

Since our work heavily depends on the three MedVLP models-[BioViL](https://www.microsoft.com/en-us/research/publication/making-the-most-of-text-semantics-to-improve-biomedical-vision-language-processing/), [MedKLIP](https://chaoyi-wu.github.io/MedKLIP/), [KAD](https://www.nature.com/articles/s41467-023-40260-7), we recommend three separate environments for each model.

We provide the environment setup for BioViL model, while the other two can be found in their respective repositories.

### BioViL 

To create the environment, run the following command:

`conda env create -f environment.yml`

### MedKLIP

Please find the environment setup at https://github.com/MediaBrain-SJTU/MedKLIP

### KAD

Please find the environment setup at https://github.com/xiaoman-zhang/KAD/tree/main


## Datasets
 
All datasets should be downloaded and placed under the `data` directory.

### COVIDx CXR-4 Dataset

COVIDx CXR-4 is a major expansion of the dataset series COVIDx CXR-4. It includes 84,818 CXR images from 45,342 patients. The dataset has two classes: COVID-19 positive and COVID-19 negative. In this study, we used the official test set, which is perfectly class-balanced with 4,241 images in each category, totalling 8,482 images.

The dataset can be obtained at https://www.kaggle.com/datasets/andyczhao/covidx-cxr2

### NIH ChestXray14 Dataset

ChestX-ray14 consists of 112,120 CXR images across 14 disease classes: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, and Hernia. Our tests strictly followed the official train-test split, using a test set that includes 25,597 chest X-ray samples.

The dataset can be obtained at https://nihcc.app.box.com/v/ChestXray-NIHCC

### CheXpert Dataset

The CheXpert dataset contains 224,316 CXR images. We used the official test set, which includes 500 CXR images annotated by radiologists. Following the original paper, our study focuses on the evaluation of 5 observations on the official test set: Atelectasis, Cardiomegaly, Consolidation, Edema and Pleural Effusion.

The test dataset can be obtained at https://github.com/rajpurkarlab/cheXpert-test-set-labels


## Model Checkpoints and Scripts

We do not redistribute the model checkpoints, but they can be obtained from the respective repositories of the models.

### BioViL

The BioViL model is automatically downloaded from official source when running the code.

All scripts for testing BioViL are written by us and included in the repository.

### MedKLIP

The MedKLIP model can be obtained at https://github.com/MediaBrain-SJTU/MedKLIP
Please copy the file `checkpoint_final.pth` ant put it under the `models` directory.

Please copy the entire `models` directory from `Sample_zero-shot_Classification_CXR14` and put it under the `src\medklip` directory.
Additionally, copy the `observation explanation.json` file from `Sample_zero-shot_Classification_CXR14` and put it under the `data` directory.

### KAD

The KAD model can be obtained at https://github.com/xiaoman-zhang/KAD/tree/main
Please copy the file `best_valid.pt` and `epoch_latest.pt` and put them under the `models` directory.

Please copy the directories `engine`, `factory`, `models`, `optim`, `scheduler` from `KAD` and put them under the `src\kad` directory.

## Reproducing Experiment Results

We provide the experiment results as .csv files under the experiments directory.

If you wish to reproduce the experiment results yourself, after setting up the environments, obtaining the datasets, and downloading the model checkpoints, you can do so by:

Running `test_biovil.py` with the BioViL environment.

Running `test_medklip.py` with the MedKLIP environment.

Running `test_kad.py` with the KAD environment.

Results will be saved under the `experiments` directory.

Summaries can be generated by running `summarise_experiments.py`.