# UnSAF: A Self-Assessment Framework of Uncertainty Awareness for Multimodal LLMs
This is the implementation of UnSAF: A Self-Assessment Framework of Uncertainty Awareness for Multimodal LLMs


## Setup
Setup a virtual environment and install all requirements:
```bash
pip install -r requirements.txt
```

The mllms used in our experiment all can be found in huggingface (this will ask for your HuggingFace token) and can be downloaded easily.
You can 
(1) follow the guide in https://huggingface.co/docs/hub/models-downloading to download mllms;
(2) follow the steps below.
```bash
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh
export HF_ENDPOINT=https://hf-mirror.com
```
and download model by './hfd.sh the-model-name', download dataset by './hfd.sh the-dataset-name --dataset'


## Codebase structure
The `/Model` folder contains the necessary code to get mllm response.

The `/main` folder contains the necessary code to call for evaluator function.

The `/evaluator` folder contains the necessary code for evaluate mllms like UnF1 score, ECE, accuracy, and hallucination.

The `/utils` folder contains the necessary code for common functions, such as prompts used to guide mllms(./prompt, ./GPTprompt), system prompts(./conversation), extracting key points from answers(./utils), conventional metrics for uncertainty evaluation(./ECE).

The `/utils` folder contains the necessary bash file for users.

## Get Activations

First, fill in the model-path blank in the bash file.

### Scaling Law--Impact of model size on uncertainty awareness mirrors accuracy.
If you want to test the mpact of model size on uncertainty awareness mirrors accuracy, first use UnSAF framework and get the UnF1 score of mllms, run:
```
bash ./scripts/Uncertainty.sh
```
Next, we want to get the accuracy of open-ended dataset, run:
```
bash ./scripts/ACC.sh
```
To get the accuracy of multi-choice dataset, run:
```
bash ./scripts/ACC/ACC.sh
```
Finally, draw a plot function to visualize the result, you can also find that **Instruction-Tuned models are more timid**.

### Prompt Engineering--Prompt engineering provides limited uncertainty-aware benefits.
To get the prompt engineering effect of mllms, run:
```
bash ./scripts/ProUncertainty.sh
```

### UnSAF dataset construction
To get the cot answer of teacher model, run:
```
bash ./scripts/CotUncertainty.sh
```
while the non-cot answer follow the './scripts/Uncertainty.sh'.

### (Optional) UnD Distillation

the distillation just need to finetune the original mllms use UnSAF data

#### Uncertainty-aware evaluation
change the model path and repeat the './scripts/Uncertainty.sh'

#### Accuracy evaulation
change the model path and repeat the

#### hallucination evaluation
To get CHAIR_i and F1 under CHAIR framwork, run:
```
bash ./scripts/Hallu/CHAIR.sh
```
To get the F1 under POPE framework, run:
```
bash ./scripts/Hallu/POPE.sh
```