# MARINE: Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

## Contents
- [Install](#install)
- [Prepare Visual Guidance](#prepare-visual-guidance)
- [Generation with MARINE](#generation-with-marine)
- [Evaluation](#evaluation)
- [Acknowledgements](#acknowledgements)

## Install
Our implementation is based on the official repository. Please follow the instructions in [LLaVA](https://github.com/haotian-liu/LLaVA), [LLaVA2], [mPLUG-Owl2], [InstructBLIP](https://github.com/salesforce/LAVIS) and [MiniGPTv](https://github.com/Vision-CAIR/MiniGPT-4) to clone the model repository, prepare the environment, and download model weights.


## Prepare Visual Guidance
Here are the prepared visual guidance files in `./POPE/llava_qa/question/` directory. The question files of all experiments in the paper are included in the directory.

- image captioning tasks: use the guidance strength of 0.95, and the corresponding visual guidance file is `I4_mmc4_grey_th0.95.json`.

- POPE-related tasks: use the guidance strength of 0.5, and the corresponding visual guidance file is `[pope type]_mmc3_grey_th0.5.json`. e.g. `gqa_ad_mmc3_grey_th0.5.json` is for the GQA dataset with the POPE type of `adversarial`.

- MME tasks: use `[mme type]_mmc3_grey_th0.95.json`.

To prepare visual guidance on your own data, 
1. generate visual guidance using DETR. You can use the following command and change `IMAGE_DIR`.
```bash
bash ./detr/auto_detr.sh
```
You can assign noise intensity by changing `--th` here.

2. build negetive prompts(visual guidance) in the format of prompt to be used in CFG generation. You can use the following command and change `QUESTION_FILE`.
```bash
python ./POPE/QA_generation.py
```

## Generation with MARINE
To generate captions with MARINE (e.g. on LLaVA or LLaVA2), you can use the following command:
```bash
# llava
bash ./LLaVA/answers/llava_vqa_cfg.sh
# llava-v1.5
bash ./LLaVA2/answers/eval_llava2.sh
```
Or you can use the following command to generate captions with MARINE on other models. 
```bash
bash vqa_cfg_all.sh
```

Choose `--QUESTION_FILE` from 
`./POPE/llava_qa/question/` directory. 
You can change the `--cfg_values` to use different guidance strength. 
We recommend a guidance strength within the range of $\gamma \in (0.3,0.7)$ as the most effective to effectively mitigate object hallucinations and ensure high-quality, accurate outputs while adhering closely to the given instructions. 

## Evaluation

### CHAIR Evaluation
The CHAIR metrics is computed based on the official implementation. 
Please see [Maxlinn](https://github.com/Maxlinn/CHAIR-metric-standalone) for details.
```bash
bash ./LLaVA/answers/auto_chair_llava.sh
bash ./LLaVA2/answers/auto_chair_llava2.sh
```
### POPE Evaluation
The POPE metrics is computed based on the official implementation. Please see [POPE](https://github.com/AoiDragon/POPE) for details.
```bash
bash ./LLaVA/answers/auto_pope_llava.sh
bash ./LLaVA2/answers/auto_pope_llava2.sh
```

### GPT-4V Evaluation
To specify your API key and run
```bash
python ./gpt4v/gpt4v_eval.py
```

### Additional Evaluation
You can also evaluate the generated captions with other metrics, such as BLEU, METEOR, ROUGE, and CIDEr. 
```bash
python ./VLMEvalKit/vlmeval/evaluate/coco_eval.py \
--data $data_path
```

---
### Acknowledgements
We thank the authors of LLaVA, LLaVA2, mPLUG-Owl2, InstructBLIP, and MiniGPTv, CHIAR, POPE, VLMEvalKit for their excellent work. 
```