# Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models

<p align="center" width="100%">
<img src=figs/overview_upd.png  width="100%" height="100%">

## Requirements

### Installation
We mainly follow the [LLaVA's](https://github.com/haotian-liu/LLaVA/tree/main) environment for the Installation.    
For the implementations, we utlize Nvidia A100 GPUs with 80G.
We utilize single GPU for the LMMs' inference and two GPUs for instruction tuning.

```Shell
conda create -n upd_en python=3.10 -y
conda activate upd_en
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e ".[train]" # not necessary for inference
pip install flash-attn --no-build-isolation --no-cache-dir  # not necessary for inference
```

### Data
#### MM-UPD Bench
![UPD_SETTGING_OVERVIEW](figs/example_each_setting.png)

We provide all benchmarks (.tsv) via [this url](https://drive.google.com/file/d/1n1y1TOdd7FCVbVtKk-HXhFWSFY5OPHVB/view?usp=sharing).
Please download, untar and put it to `~/data`.

#### Instruction tuning data (Optional)
We instruction tuning data (.json) via [this url](https://drive.google.com/file/d/1SI3h0QFn5F9VMVtOBJ2sPKq8nBZgc7nA/view?usp=sharing).       
Please download, unzip and put it to `~/data`.   
As for the images for the instruction tuning data, we used the images for the official LLaVA's instruction tuning. Please download the images from constituting datasets:
- COCO: [train2017](http://images.cocodataset.org/zips/train2017.zip)
- GQA: [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
- OCR-VQA: [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing), **we save all files as `.jpg`**
- TextVQA: [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
- VisualGenome: [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)

Also, we provide the checkpoint for the instruction tuning via [this url](https://drive.google.com/file/d/1J9GBDzsniL0SP17b8dJAyUro3tKQOyLk/view?usp=drive_link). 
Please download, unzip and put it to `~/checkpoints`.

The overall file structure is as follows:
```
UPD
|-- checkpoints
    |-- llava-v1.6-vicuna-13b-task-lora
    |-- llava-v1.6-34b-task-lora
|-- data
    |-- inst_tuning
        |-- upd_tuning_data_20240303.json
        ├── coco
        │   └── train2017
        ├── gqa
        │   └── images
        ├── ocr_vqa
        │   └── images
        ├── textvqa
        │   └── train_images
        └── vg
            ├── VG_100K
            └── VG_100K_2
    |-- mmaad_20240303_base.tsv
    |-- mmaad_20240303_option.tsv
    |-- mmiasd_20240303_base.tsv
    |-- mmiasd_20240303_option.tsv
    |-- mmivqd_20240303_base.tsv
    |-- mmivqd_20240303_option.tsv

```

### API KEY
We need to create API keys for OpenAI and Gemini.
With API keys, we can run the following commands:
```bash
export OPENAI_API_KEY='your-api-key-here'
```

```bash
export GEMINI_API_KEY='your-api-key-here'
```

About the setup of API keys, please refer to the [OpenAI official page](https://platform.openai.com/docs/quickstart?context=python) as for more detail.

## Quick Start
### 1. Inference of LMMs
We put each script in `~/scripts/inference/<LMM name>/<UPD>`.
For example, to implement LLaVA-1.5 13B for the base setting,
you can implement the following commands for each AAD and Standared senario:

#### base
```bash
bash scripts/inference/llava1.5/aad/base.sh
```
By implementing the above code, the result is automatically created under `output/aad/answers_upload/llava1.5/base/mmaad_base/llava1.5-13b_<time_stamp>.xlsx`

### 2. Evaluation
We put each evaluation script in `~/scripts/evaluation/<UPD>`.

For example, to evaluate the performance of LLaVA-1.5 13B for the base setting,
you can implement the following commands:
```bash 
bash scripts/evaluation/aad/eval_base.sh <RESULT_PATH>
```
> * <RESULT_PATH> is `output/aad/answers_upload/llava1.5/base/mmaad_base/llava1.5-13b_<time_stamp>.xlsx` in this example.    

By implementing the above code, the result is automatically created in each RESULT_PATH folder.

### 3. Instruction Tuning
We put each script in `~/scripts/inst_tuning`.
For example, to implement LLaVA-1.6 34B,
you can implement the following commands:
```bash
bash scripts/inst_tuning/llava1.6_34b_lora_tuning.sh
```


## Acknowledgement
We adopt these codes to create this repository.
* [Visual Instruction Tuning](https://github.com/haotian-liu/LLaVA), in NeurIPS, 2023.
* [OpenCompass](https://github.com/open-compass/opencompass)
* [Otter](https://github.com/Luodian/Otter/)
