## outside evaluate

This directory collects six representative baseline methods for mitigating and analyzing object hallucinations in vision-language models (VLMs). Each method includes a brief introduction, paper link, and official code implementation.

---

### Methods on llava-hf/llava-1.5-7b-hf

These three methods use the HuggingFace-based llava-hf/llava-1.5-7b-hf model. If you have run our source code, these methods can be executed directly in the same environment.

#### 1. MLIH: Middle Layers Indicating Hallucinations

* **Paper**: [Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens](https://arxiv.org/abs/2411.16724v3)
* **Code**: [github.com/ZhangqiJiang07/middle\_layers\_indicating\_hallucinations](https://github.com/ZhangqiJiang07/middle_layers_indicating_hallucinations)
* **Brief**: MLIH analyzes the internal activations of VLMs and finds that hallucinated object information often emerges in specific middle layers. It provides a straightforward hallucination detection and attribution method by monitoring middle-layer neuron activations.

#### 2. VHR: Vision-Language Hallucination Reduction via Test-time Head Reweighting

* **Paper**: [Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence](https://arxiv.org/abs/2412.13949)
* **Code**: [github.com/jinghan1he/VHR](https://github.com/jinghan1he/VHR?tab=readme-ov-file)
* **Brief**: VHR proposes a plug-and-play test-time attention head reweighting strategy. Without retraining, it adaptively down-weights heads responsible for hallucinations, reducing spurious object generations.

#### 3. SPIN: Self-Penalized Inference for Hallucination Suppression

* **Paper**: [Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression](https://arxiv.org/abs/2505.16411)
* **Code**: [github.com/YUECHE77/SPIN](https://github.com/YUECHE77/SPIN)
* **Brief**: SPIN adjusts the logits of hallucinated object tokens during inference, explicitly suppressing spurious descriptions. It significantly lowers hallucination rates in both zero-shot and fine-tuned settings.

---

### Methods on liuhaotian/llava-v1.5-7b

The following three methods are integrated as submodules in the `third_party` directory and require separate virtual environments. Please refer to official instructions and our notes for path modifications.

#### 4. VCD: Vision-based Caption Denoising

* **Paper**: [Caption Denoising Improves Hallucination Suppression in Vision-Language Models](https://arxiv.org/abs/2311.16922)
* **Code**: [github.com/DAMO-NLP-SG/VCD](https://github.com/DAMO-NLP-SG/VCD)
* **Brief**: VCD introduces a diffusion-based denoising process during inference. By injecting and removing structured noise, it improves model robustness and suppresses spurious object mentions, especially in open-ended or challenging scenarios.

#### 5. PAI: Progressive Attention Intervention

* **Paper**: [Progressive Attention Intervention for Hallucination Mitigation in Vision-Language Models](https://arxiv.org/abs/2407.21771)
* **Code**: [github.com/LALBJ/PAI](https://github.com/LALBJ/PAI)
* **Brief**: PAI adopts a progressive attention intervention approach, dynamically adjusting attention weights layer by layer during inference. It identifies and calibrates hallucination-prone heads and layers, effectively suppressing spurious objects while preserving semantic richness—without extra training.

#### 6. OPERA: Output-level PErturbation and Recalibration Approach

* **Paper**: [OPERA: Mitigating Object Hallucination in Vision-Language Models by Output-level Perturbation and Recalibration](https://arxiv.org/abs/2311.17911)
* **Code**: [github.com/shikiw/OPERA](https://github.com/shikiw/OPERA)
* **Brief**: OPERA mitigates hallucinations by perturbing the output logits of candidate object tokens during decoding. It detects and suppresses hallucinations by recalibrating output probabilities, making it effective for various vision-language tasks.

---

## Usage

1. For each baseline, enter its folder and run the provided Jupyter Notebook (e.g., `vhr_eval.ipynb`). This will automatically generate JSONL files with model outputs and predictions.
2. After all baseline outputs are ready, return to the root directory and run `evaluate.ipynb` to compute unified metrics and compare all methods.

---

For any questions or feedback, please open an issue or contact the maintainers.

---

