# VisOnlyQA

This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information".

## Dataset

The dataset is in the [dataset](dataset) folder.

Due to the limit of the file size for the supplementary material, this directory does not includes the training set. Most of images in the validation and test sets are also omitted.

## Setup

```bash
bash setup.sh
```

## Evaluate LVLMs on VisOnlyQA

Please refer to the shell scripts in the [shell/4_evaluation](shell/4_evaluation) folder.

```bash
bash shell/4_evaluation/evaluation_open_small.sh
```

## Reproduce Fine-tuning

```bash
bash shell/3_training/train_internvl2_4B.sh
```

## Reproduce Dataset Creation

Datasets are provided in the [dataset](dataset) folder. You do **not** need to run the dataset creation code to use the datasets.

If you are interested in reproducing the dataset creation process, follow the instructions below.

### Setup

If you are interested in reproducing the annotation interface: We use Google Spreadsheet for annotation. You need to set up Google API Credentials.

* Follow the instructions at [https://pythonhosted.org/PyDrive/quickstart.html#authentication](https://pythonhosted.org/PyDrive/quickstart.html#authentication).
* Follow the instructions at [https://docs.gspread.org/en/latest/oauth2.html](https://docs.gspread.org/en/latest/oauth2.html).
  * Download the credential file at [credentials/google_spreadsheet_credential.json](credentials/google_spreadsheet_credential.json).
  * Put your Google Account (email address)  in [credentials/google_sccount_email.txt](credentials/google_sccount_email.txt).

### Run

```bash
export HF_ACCOUNT="your_hugging_face_account"  # dataset will be created in your HF account as private datasets
export CONDA_SH="~/anaconda3/etc/profile.d/conda.sh"  # set your anaconda path
bash setup.sh
```
