# Anonymous Submission README (for Review Only)

## Method Overview
We propose a Clinical Contrastive Decoding (CCD) framework for chest X-ray reporting/visual question answering. 
The method augments a general-purpose multi-modal LLM with:
- Symptom-grounded Contrastive Decoding
- Expert-informed Contrastive Decoding

## Key Components
- Visual backbone + multimodal LLM (referred to as "the model")
- CheXpert pathology predictor and View classifier (from torchxrayvision)
- CCD decoding algorithm

## Repository Layout (review subset)
- `ccd.py`: main evaluation script implementing CCD with llava-style models. (Due to anonymity requirements, the model package is not disclosed and not included.)
- For the radiology model, you can use open-source models such as LLaVA-Med or MAIRA-2. (Users need to obtain permission to use these models.)
- `readme.md`: this document.

## Dependencies
- Python >= 3.9
- PyTorch + CUDA (recommended for performance)
- Transformers, Datasets, Pillow, pydicom, tqdm, shortuuid, numpy, requests, scikit-image, torchvision, torchxrayvision
- A local importable multi-modal LLM package providing the following interfaces:
  - `constants`: `IMAGE_TOKEN_INDEX`, `DEFAULT_IMAGE_TOKEN`, `DEFAULT_IM_START_TOKEN`, `DEFAULT_IM_END_TOKEN`
  - `conversation`: `conv_templates`, `SeparatorStyle`
  - `model.builder`: `load_pretrained_model`
  - `utils`: `disable_torch_init`
  - `mm_utils`: `tokenizer_image_token`, `process_images`, `get_model_name_from_path`, `KeywordsStoppingCriteria`

**Note:** The model package name is intentionally kept anonymous for review purposes. Any multi-modal LLM with compatible APIs can be used.

## Installation (example)
```bash
# Choose a CUDA build that matches your environment
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==0.13.0
pip install transformers datasets pillow pydicom tqdm shortuuid numpy requests scikit-image torchxrayvision
# Install your multi-modal LLM package here, e.g.:
# git clone <anonymous-mllm-repo>
# cd <anonymous-mllm-repo>
# pip install -e .
```

Ensure the model package is importable (e.g., installed or added to `PYTHONPATH`).

## Dataset
⚠️ MIMIC-CXR requires a PhysioNet account with data use agreement.
Please follow instructions at the official PhysioNet website to create an account and download the dataset.

## Running Evaluation
```bash
python ccd.py \
  --model-path <path-or-hf-repo-of-your-mllm> \
  --image-folder <path-to-mimic-cxr-images> \
  --question-file <path-to-mimic-cxr-questions.jsonl> \
  --answers-file <path-to-mimic-cxr-answers.jsonl> \
  --conv-mode <conv_mode> \
  --temperature 0.0 \
  --max_new_tokens 256 \
  --alpha 0.5 \
  --beta 0.5 \
  --boost-gamma 10.0
```

Outputs: a JSONL file with fields `question_id`, `prompt`, `text`, `model_id`, and `metadata`.

**Note:** Replace `<path-or-hf-repo-of-your-mllm>`, `<path-to-mimic-cxr-images>`, `<path-to-mimic-cxr-questions.jsonl>`, and `<path-to-mimic-cxr-answers.jsonl>` with actual paths.

## MedSigLIP
You must obtain access permission from the original authors to use MedSigLIP weights.
Please refer to Google‘s Hugging Face page for more details.
Then change the `_get_chexpert_models` with the following code to load MedSigLIP.
```bash
def _get_med_sigilip_model():
    global _MED_SIGILIP
    if _MED_SIGILIP is None:
        model_id = "google/medsiglip-448"
        model = AutoModel.from_pretrained(model_id).to(_DEVICE)
        model.eval() 
        processor = AutoProcessor.from_pretrained(model_id)
        _MED_SIGILIP = (model, processor)
    return _MED_SIGILIP
```
and add the following import statements:

```python
findings = [
    "Atelectasis",
    "Cardiomegaly",
    "Consolidation",
    "Edema",
    "Enlarged Cardiomediastinum",
    "Fracture",
    "Lung Lesion",
    "Lung Opacity",
    "Pleural Effusion",
    "Pneumonia",
    "Pneumothorax",
    "Pleural Other",
    "Support Devices",
]

def make_prompts(label):
    return [
        f"a chest X-ray with {label.lower()}",
        f"a chest X-ray with no {label.lower()}"
    ]

def predict_chexpert_labels_with_medsiglip(image):

    if image.mode != "RGB":
        image = image.convert("RGB")

    all_prompts = []
    label2indices = {}


    for l in findings:
        pos, neg = make_prompts(l)
        label2indices[l] = (len(all_prompts), len(all_prompts)+1)
        all_prompts.extend([pos, neg])


    MedSiglip_model, MedSiglip_processor = _get_med_sigilip_model()

    inputs = MedSiglip_processor(text=all_prompts, images=[image], padding="max_length", return_tensors="pt").to(_DEVICE)
    with torch.no_grad():
        outputs = MedSiglip_model(**inputs)
    logits = outputs.logits_per_image.squeeze(0) 

    results = {}
    for label, (pos_idx, neg_idx) in label2indices.items():
        pair_logits = torch.stack([logits[pos_idx], logits[neg_idx]])
        probs = torch.softmax(pair_logits, dim=0)
        results[label] = probs[0].item()  

    return results

```

## Method Details (concise)
1. Predict CheXpert label probabilities from the input image using torchxrayvision.
2. Map labels to tokenizer token IDs; assign positive/negative biases proportional to confidence (with clipping).
3. Build two prompts: (a) original, (b) augmented with concise clinical guidance derived from labels.
4. Decode by mixing token log-probabilities from (a) and (b) with weight `alpha`, then add label-informed biases, and finally blend with weight `beta`.
5. Sample or greedily select next tokens until stop.

Key hyperparameters:
- `alpha`: weight of symptom-grounded contrastive decoding.
- `beta`: weight of expert-informed contrastive decoding.
- `boost_gamma`: bias clipping strength for expert-informed decoding.
- `temperature`, `top_p`, `max_new_tokens`: decoding controls by the user.

## Reproducibility Notes
- GPU is recommended. On CPU, change `.cuda()` calls to device-agnostic `.to(device)` and set device accordingly.
- torchxrayvision will download pretrained weights on first use.
- the dataset must be downloaded manually due to access restrictions.

## Anonymity Checklist
- No author names, affiliations, or private URLs in code or text
- The model package is unnamed; do not include institution-specific repos
- Paths in examples are generic; logs contain no user info
- Do not commit model checkpoints or data that could deanonymize the authors

## Contact (anonymized)
For questions during the review process, please use the conference discussion forum. No identifying information is included here by design.
