# On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models

This repo runs generation and evaluation on LLaVA-1.5 (7B / 13B) using various decoding methods. The goal is to measure and reduce hallucination, leveraging adversarial image perturbations when our method is applied.

## Requirements 

To run generation and evaluation, the following datasets are required:

### 1. MSCOCO 2014 Validation Set
- **Download:** [MSCOCO val2014](https://cocodataset.org/#download)
- **Instructions:** Extract the `val2014` into `dataset/coco` directory.
- **Annotations:** Include `captions_train2014.json`, `captions_val2014.json`, `instances_train2014.json`, and `instances_val2014.json` within the `dataset/coco/annotations` directory. 
```bash
dataset/
└── coco/
    ├── val2014/
    └── annotations/
        ├── captions_train2014.json
        ├── captions_val2014.json
        ├── instances_train2014.json
        └── instances_val2014.json
```
### 2. AMBER Images
- **Download:** [AMBER Image Set](https://drive.google.com/file/d/1MaCHgtupcZUjf007anNl4_MV0o4DjXvl/view)
- **Instructions:** Extract the archive into `dataset/AMBER` directory.
```bash
dataset/
└── AMBER/
    └── image/
```

### 3. Environment

To set up the environment, run:

```
conda create -n epistemic python=3.9
conda activate epistemic
pip install -r requirements.txt
```

### 4. (Optional) Install `pattern` Library Locally For CHAIR Evaluation

Clone the `pattern` library into your project directory:

```bash
git clone https://github.com/clips/pattern.git baselines/pattern
```

To install the library, run:
```
conda install -c conda-forge mysqlclient
pip install -e ./baselines/pattern
```

## Models

* `llava-1.5-7b`
* `llava-1.5-13b`

## Supported Decoding Methods

* `greedy`
* `opera`
* `vcd`
* `pai`
* `devils`
* `beam`
* `dola`

## Datasets

* `chair`
* `pope`
* `amber` 


## Generate Adversarial Images
   - `model`: Vision encoder model to attack - default: `openai/clip-vit-large-patch14-336` for LLaVA-1.5
   - `data`: Root directory path for image to be attacked.
   - `output`: Name of the output (attacked image) directory.
   - `epsilon`: The size of epsilon ball. (`k` value in the paper.) - default: `3`
   - `alpha`: Learning rate for the adversarial attack. - default: `1`
   - `steps`: Total number of PGD attack iteration. (`I` value in the paper) - default: `200`
   - `lambda_feat`: Coefficient of loss. - default: `1`
   - `dataset`: The name of the dataset. - Choices: `["chair", "pope_rand", "pope_pop", "pope_adv", "amber"]`
   - `bf16`: Whether to use mixed-precision optimization for efficient backpropagation - default: `True`

Attack images from the `data_path` (*val2014 from COCO*) and use the `output` folder as `attack_folder` when applying our method

Generate adversarial images by running,

```python
python baselines/attack_llava.py --data <data_path> --epsilon 3 --steps 200 --bf16 --dataset <dataset> --output <your_output_name>
```

## Generation & Evaluation

You need to prepare the following checkpoints of LLaVa1.5 base models:

Download [LLaVA-1.5 merged 7B model](https://huggingface.co/liuhaotian/llava-v1.5-7b) and specify it at Line 13 of `eval_configs/llava-1.5_7b_eval.yaml`

Download [LLaVA-1.5 merged 13B model](https://huggingface.co/liuhaotian/llava-v1.5-13b) and specify it at 
Line 13 of `eval_configs/llava1.5_13b_eval.yaml`

From the `baselines/` directory:

```bash
# For Chair generation & evaluation
sh bash_scripts/eval_chair.sh

# For POPE generation & evaluation
sh bash_scripts/eval_pope.sh

# For AMBER generation & evaluation
sh bash_scripts/eval_amber.sh
```

For `bash_scripts/eval_pope.sh`

Set the `caption_file_path` from one of the followings:
```powershell
random=$data_path/coco/pope/coco_pope_random.json
adversarial=$data_path/coco/pope/coco_pope_adversarial.json
popular=$data_path/coco/pope/coco_pope_popular.json

# Example
--caption_file_path $random 
```

## Script Breakdown

* Runs on `eval_scripts/eval_caption.py`
* Results go to `./results/<dataset>/<date>/<EXP>`

To modify the decoding method used in evaluation:

- Open the appropriate script in `bash_scripts/` (e.g., `eval_amber.sh`)

- Locate the line that sets the variable method

- Set it to one of the supported methods:

```powershell
# Example
method=devils   # Options: greedy, opera, vcd, pai, devils, etc.
```
To enable the proposed method, set `use_ours` variable to True, use the `k_sig` **(equivalent to `sigma_th` in our paper)** of choice, locate the `attack_folder`:

```powershell
use_ours=True
k_sig=1.1

# Example
--attack_folder $data_path/AMBER/fast_attack_llava_amber_eps_3_step_200 \
```

## Acknowledgement

This repository builds upon the LVLM codebase from [Hallucination-Attribution](https://github.com/TianyunYoung/Hallucination-Attribution) and [HalC](https://github.com/BillChan226/HALC). We sincerely thank the authors for their valuable and inspiring work.