## COAST-domain
- Download `Cambrian-10M`[1] images from huggingface; Note that we only use four datasets of Cambrian-10M: chartqa, docvqa, iconqa, medicalqa. 

- After downloaded and unzipped, organize the folders as:
- `playground/data/chartqa/`: #chartqa dataset
  - `train/`
  - `val/`
  - `test/`
- `playground/data/docvqa/`: #docvqa dataset
  - `images/`
- `playground/data/hfdata/`: #iconqa dataset
  - `iconqa/`
- `playground/data/pathvqa/`: #medicalqa dataset
  - `images/`

 

## COAST-capability
- Download images of `SVIT`[2] from huggingface; After downloaded and unzipped, organize the folders as:
    - `playground/data/data/SVIT/VG_100K`

## COAST-dataset
- Please follow instructions in the github page of [3] to download dataset and organize them as:
- `playground/data`:
    - `ImageNet/`
    - `TextVQA/`
    - `GQA/`
    - `VizWiz/`
    - `VQAv2/`
    - `OCRVQA/`
    - `Grounding/`
    - `ScienceQA/`


## Reference

[1] Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

[2] SVIT: Scaling up Visual Instruction Tuning

[3] CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model