<div align="center">

<h1>To Trust Or Not To Trust Your Vision-Language Model's Prediction</h1>





<div style="text-align:center">
<img src="TrustVLM.png"  width="80%" height="100%">
</div>

---

</div>

Illustration of TrustVLM's mechanism. Initially, the incorrect prediction receives a higher confidence score than the correct one, indicating overconfidence. By performing verification in the image embedding space, this overconfidence is mitigated. As a result, the final confidence score is significantly higher for the correct prediction than for the incorrect one.



## Prerequisites

### Environment 
The code was tested using `Python 3.10.13`, `torch 2.3.1+cu121` and `NVIDIA GeForce RTX 3090`. More dependencies are in `requirement.txt`.

### Datasets 

We suggest downloading all datasets to a root directory, and renaming the directory of each dataset as suggested in `${ID_to_DIRNAME}` in `./data/datautils.py`.  


## Run TrustVLM-D
```
bash ./test_trustvlm.sh
```


## Run TrustVLM*-D
```
bash ./test_trustvlm_v2.sh
```

