## Installtion

```
pip install -r requirements.txt
bash install.sh
```

Remove the env variable setting part in `install.sh` unless you installed cuda via conda.


## Data Preparation

```
cd code/prepare_data
python refcoco_mix.py
```

## Training

```
cd code
python build_script.py --config configs/phi3v_ours.yaml --opt 1
bash train1.sh
```

- `--config`: refer to `code/configs/*`.
- `opt{1,2}`: use either `[0-7]` or `[8-15]` gpus. Also takes care of `master_port` for deepspeed.

## Caveats

- For llava-onevision or qwen2-vl: Make sure to install transformers >= 4.45.0 (as of 14 Sep 2024, that requires installing from the source repository.)
- jsonargparse does not work well w/ multigpus: config yaml file loading only happens in rank 0
