# LLaVA-1.5-LAION and LLaVA-1.5-OpenAI
This repository contains the code for the LLaVA-1.5-LAION and LLaVA-1.5-OpenAI experiments. It is largely an adaptation of the [LLaVA repository](https://github.com/haotian-liu/LLaVA), with some modifications to the codebase to support the new models.

## Models

### LLaVA-1.5-LAION

LLaVA-1.5-LAION is a version of LLaVA-1.5-7B that uses LAION-400M-pretrained CLIP. We follow the LLaVA-1.5 recipe for training/visual instruction tuning. Please follow these [instructions](https://github.com/haotian-liu/LLaVA/tree/main?tab=readme-ov-file#pretrain-feature-alignment) to download the pretraining/instruction tuning datasets.

To pretrain (training the multimodal connector), run:

```bash
./scripts/pretrain.sh
```

To run visual instruction tuning on the pretrained checkpoint, run:

```bash
./scripts/finetune.sh
```

### LLaVA-1.5-OpenAI

In the paper, we refer to the default LLaVA-1.5-7B model as LLaVA-1.5-OpenAI.

## Evals

We evaluate on [TextVQA](https://textvqa.org/dataset/) and [VQAv2](https://visualqa.org/vqa/dataset/).

To run the evals, run:

```bash
./scripts/eval/textvqa.sh
./scripts/eval/vqav2.sh
```
