# GPT as Visual Explainer

This repository contains the source code implementation for LVX (Language Model as Visual Explainer), a systematic approach presented in our paper for interpreting the internal workings of vision models using tree-structured language explanations, without the need for additional model training.

## Getting Started

### Installation
To get started with LVX, please install the required dependencies by running the following command:
```shell
pip install -r requirements.txt
```



### File Orgnization 
```
calibrate/
    ├── calibrate.py
    └── ...
data/
    ├── <DATA_NAME>/
    |   ├── attr2prompt.py
    |   ├── class2attr.py
    │   ├── <DATA_NAME>_classes.txt
    │   ├── <DATA_NAME>_parsetree_gt.json
    │   ├── <DATA_NAME>_prompts.json
    │   └── <DATA_NAME>.json
dataset/
    ├── <COSTUM_DATASET>.py
evaluate/
    ├── compute_<METRICS_NAME>.py
models/
    ├── densenet.py
    └── ...
tools/
    ├── query_image.py
    └── ...
lvx_<DATA_NAME>.py
random_<DATA_NAME>.py
constant_<DATA_NAME>.py
README.md
```



## Tree Construction

LVX involves constructing a tree structure that captures the hierarchical relationships between visual concepts. The tree construction process consists of the following steps:

1. **Initial Tree Construction:** In this step, the initial tree structure is constructed by utilizing the Language and Logic Module (LLM). The LLM generates hierarchical descriptions of visual concepts for each category, forming the basis of the tree structure.

```shell
python data/<DATA_NAME>/class2attr.py
python data/<DATA_NAME>/attr2prompt.py
```
With the python script:
```python
api_key = "<OPENAI_API_KEY>"  # Replace with your own API key
prompt_text = """<IN-CONTEXT PROMPT>"""
prompt = [{"role": "user", "content": prompt_text}]
completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=prompt
)
```
The results is an attribute tree file `<DATA_NAME>.json` and a prompt tree file `<DATA_NAME>_prompt.json`

1. **Image Collection**: To enrich the tree structure, relevant images are collected using a text-to-image API. These images align with the textual descriptions generated by the LLM and provide visual representations of the concepts in the tree structure.
Example Shell Command:

```shell
python tools/query_image.py
```

Code example for image collection with Bing search:
```python
directory = '../imagenet_images' # <Support Image Folder>
# Example usage:
download_images_bing('imagenet/imagenet_prompt.json',
                        directory, number_image=8)
```
By doing this, the support image corresponding to each attribute will be saved at `<Support Image Folder>`, each within a seperated foler

3. **Tree Refinement**: The constructed tree structure is dynamically refined by querying the LLM using language templates. This refinement process allows for the adaptation of the tree structure to the specific model and the elimination of undesired concepts based on model predictions.


## Test-time Explain

During test time, LVX generates human-understandable explanations in the form of attribute-laden trees for input samples. The explanations are tailored to the specific image content and provide insights into the internal workings of the vision model.

```shell
python lvx_<DATA_NAME>.py \
--gpu_id 0 \
--support_data_dir <Support Image Folder> \
--prompt_file data/<DATA_NAME>/<DATA_NAME>_prompt.json \
--classifier <CLASSIFIER_NAME> >> "$OUTPUT_FILE"
```

Then you will get a file named `<DATA_NAME>_test_tree_<CLASSIFIER_NAME>.json`. It contains the tree explation for each test sample, organized as a json.

## Evaluate the performance

```shell
echo "Running compute_TED.py with classifier $CLASSIFIER_NAME..."
PYTHONPATH="$PWD" python evaluate/compute_TED.py \
<DATA_NAME>_test_tree_<CLASSIFIER_NAME>.json \
imagenet_parsetree_gt.json >> "$OUTPUT_FILE"

echo "Running compute_mcs.py with classifier $CLASSIFIER_NAME..."
PYTHONPATH="$PWD" python evaluate/compute_mcs.py \
<DATA_NAME>_test_tree_<CLASSIFIER_NAME>.json \
imagenet_parsetree_gt.json >> "$OUTPUT_FILE"

echo "Running compute_tkd.py with classifier $CLASSIFIER_NAME..."
PYTHONPATH="$PWD" python evaluate/compute_tkd.py \
<DATA_NAME>_test_tree_<CLASSIFIER_NAME>.json \
imagenet_parsetree_gt.json >> "$OUTPUT_FILE"
```
Alternatively, you can run the following script to perform both explanation and evaluation:
```shell
bash script/eval_<DATA_NAME>_lvx.sh
```

## [optional] Model Calibration

In addition to explanation generation, LVX offers a model calibration mechanism. The vision model can be retrained by calibrating it on the generated concept hierarchy. This refinement process enables the model to incorporate the enhanced structure of visual attributes extracted through LVX.

To perform model calibration, run the following command:
```shell
bash calibrate/train_<DATA_NAME>.sh
```