# VPGTrans: Transfer Visual Prompt Generator across LLMs
We temporally remove all configs files that contain external links to avoid information leaking.

## Installation
**1. Prepare the code**

```bash
pip install -e .
```


## VL-Vicuna Demo

**1. Prepare the pretrained Vicuna weights**  
To run VL-Vicuna locally, you need to first prepare the  
The current version of VL-Vicuna is built on the v0 versoin of Vicuna-7B.
The final weights would be in a single folder in a structure similar to the following:

```
vicuna_weights
├── config.json
├── generation_config.json
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
...   
```

## Evaluation

## Training
The stage-1 pre-training requires COCO caption and SBU.
The stage-2 also requires VG caption and Laion-COCO.

If you want to remove some datasets in the training.
Please comment the items in the config file, like:
```yaml
# lavis/projects/blip2/train/llama_vpgtrans_step1_proj_warmup.yaml
datasets:
  coco_caption:
    vis_processor:
        train:
          name: "blip2_image_train"
          image_size: 224
        eval:
          name: "blip_image_eval"
          image_size: 224
    text_processor:
        train:
          name: "blip_caption"
        eval:
          name: "blip_caption"
#  the SBU will not be used in the pre-training
#  sbu_caption:
#    vis_processor:
#        train:
#          name: "blip2_image_train"
#          image_size: 224
#    text_processor:
#        train:
#          name: "blip_caption"
```

### VL-LLaMA Training

**1. Stage-1 Projector Warm-up**  
First, you need to download the [BLIP2 OPT-6.7B checkpoint]().

The 1st thing is to initialize the projector with word convertor. (Optional)
```bash
CUDA_VISIBLE_DEVICES=0 python tools/linear_proj/train_linear_proj_opt_and_llama.py \
  facebook/opt-6.7b \
  /path/to/llama_7b_dir/ \
  /path/to/blip2_opt6.7b_ckpt \
  /path/to/output_dir
```  
Then, run the projector warm-up:
```bash
bash run_scripts/blip2/scale_up_train/llama_vpgtrans_step1_proj_warmup.sh \
  /path/to/blip2_opt6.7b_ckpt \
  /path/to/projector_init_weight \
  /path/to/llama_7b_dir/
```

**2. Stage-2 Direct Fine-tuning**  
Please run:
```bash
bash run_scripts/blip2/scale_up_train/llama_vpgtrans_step2_direct_finetune.sh \
  /path/to/blip2_opt6.7b_ckpt \
  /path/to/stage1_proj_warmup_checkpoint \
  /path/to/llama_7b_dir/
```

### VL-Vicuna Training
Vicuna is the instruction-tuning version of LLaMA.
Most of the scripts are similar with training LLaMA.

**1. Stage-1 Projector Warm-up**  
First, you need to download the [BLIP2 OPT-6.7B checkpoint]().

The 1st thing is to initialize the projector with word convertor. (Optional)
```bash
CUDA_VISIBLE_DEVICES=0 python tools/linear_proj/train_linear_proj_opt_and_llama.py \
  facebook/opt-6.7b \
  /path/to/vicuna_7b_dir/ \
  /path/to/blip2_opt6.7b_ckpt \
  /path/to/output_dir
```  
Then, run the projector warm-up:
```bash
bash run_scripts/blip2/scale_up_train/llama_vpgtrans_step1_proj_warmup.sh \
  /path/to/blip2_opt6.7b_ckpt \
  /path/to/projector_init_weight \
  /path/to/vicuna_7b_dir/
```

**2. Stage-2 Direct Fine-tuning**  
Please run:
```bash
bash run_scripts/blip2/scale_up_train/llama_vpgtrans_step2_direct_finetune.sh \
  /path/to/blip2_opt6.7b_ckpt \
  /path/to/stage1_proj_warmup_checkpoint \
  /path/to/vicuna_7b_dir/
```

**3. Stage-3 Visual Instruction Tuning**  
To align with conversation scenario, we conduct a short tuning using MiniGPT-4's self-instruct data (around 3,000 images).
Please run:
```bash
bash run_scripts/blip2/scale_up_train/vicuna_vpgtrans_step3_self_instruct.sh \
  /path/to/stage2_direct_tuning_checkpoint \
  /path/to/vicuna_7b_dir/
```
