# Annotation Data Collection Tool for SocialNav-SUB

This repository contains **SocialNav-SUB**, a benchmark for evaluating the scene understanding abilities of vision-language models (VLMs).

---

## 🚀 Quickstart

To evaluate models, do the following steps:

### 1. Clone the Repository
```bash
git clone <repository-url>
cd <repository-name>
```

### 2. Install Dependencies
```
pip install google-generativeai openai
```
If you would like to evaluate LLaVa-Next-Video, please following the instructions on their github: https://github.com/LLaVA-VL/LLaVA-NeXT

### 3. Download human results to compare models against
We (temporarily) provide a google drive link to the data: https://drive.google.com/drive/folders/1QpCQitO_xGzXV1prnTI1fznu-XJvsjcZ?usp=sharing 
Afterwards, place these results within the `full_human_results` folder.

### 4. Evaluating a model
Please modify `eval_cfg.yaml`, or make your own config file and replaced the filepath for the config. In this config, you can modify `baseline_model` to specify the model to evaluate, along with any other parameters you would like to change. To evaluate the model with the default config:
```
python socialnavsub/evaluate.py
```
Running this code will result in the model evaluated in the directory specified within the config.


We welcome contributions.

If you find this codebase useful, please consider citing our work.
```
(Bibtex will go here with paper release)
```

