# Vision Tool Use Evaluation Framework

An evaluation framework for vision-tool use of MLLMs.

## Overview

This repository contains tools and evaluation scripts for testing how well vision-language models can use various tools (python image processing, web search, calculator, etc.) to solve visual reasoning problems. 

## How to Run

### Generate Model Responses for Single Turn

Use the `run_model_response_single_turn.sh` script to generate responses from vision-language models:

```bash
./run_model_response_single_turn.sh
```

**Configuration Options:**
We can modify the following variables in `run_model_response_single_turn.sh`:
- `MODEL_NAME`: The model to test (e.g., "openai/gpt-5")
- `DATASET_PATH`: Path to the dataset JSON file
- `OUTPUT_DIR`: Directory to save results
- `TOOL_USE`: Enable/disable tool usage
- `SYSTEM_PROMPT_LEVEL`: System prompt level (low/medium/high)
- `MAX_TOOL_CALLS`: Maximum tool calls per response
- `NUM_WORKERS`: Number of parallel workers

**Example with custom parameters:**
```bash
# Edit the script variables, then run:
MODEL_NAME="anthropic/claude-3-sonnet"
DATASET_PATH="data/single_turn_data.json"
OUTPUT_DIR="my_eval_results"
./run_model_response_single_turn.sh
```


### Generate Model Responses for Multi Turn

Use the `run_model_response_multi_turn.sh` script to generate responses from vision-language models:

```bash
./run_model_response_multi_turn.sh
```

**Configuration Options:**
You can modify the following variables in `run_model_response_multi_turn.sh`:
- `MODEL_NAME`: The model to test (e.g., "openai/gpt-5")
- `DATASET_PATH`: Path to your dataset JSON file
- `OUTPUT_DIR`: Directory to save results
- `TOOL_USE`: Enable/disable tool usage
- `SYSTEM_PROMPT_LEVEL`: System Prompt Level (low/medium/high)
- `MAX_TOOL_CALLS`: Maximum tool calls per response
- `NUM_WORKERS`: Number of parallel workers
- `USE_GT_ANSWER`: Whether to use ground truth to construct message (default as true)

**Example with custom parameters:**
```bash
# Edit the script variables, then run:
MODEL_NAME="anthropic/claude-3-sonnet"
DATASET_PATH="data/single_turn_data.json"
OUTPUT_DIR="my_eval_results"
./run_model_response_multi_turn.sh
```