## Usage

### 1. Start vLLM Server

```bash
# Basic usage - model path as positional argument
vllm serve model_local_path \
  --port replace realistic port \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.9 \
  --max-model-len 8192 \
  --trust-remote-code \
  --served-model-name "replace evaluate model name"


# For InternVL3.5-38B (multimodal model)
vllm serve "OpenGVLab/InternVL3_5-38B-Pretrained" \
  --port 8002 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.9 \
  --max-model-len 8192 \
  --internVL3.5-38b

```

## Running Evaluation

Once configured, run the evaluation:

```bash
# Evaluate all enabled models
python eval.py --config config.json
```


## Configuration Parameters

The `config.json` file contains all the configuration parameters for the evaluation system. Below is a detailed explanation of each parameter:

#### `image_dir` (string)
Directory containing images referenced in the benchmark data.
```json
"image_dir": "images"
```

#### `results_dir` (string)
Directory where evaluation results will be saved.
```json
"results_dir": "results"
```

#### `modes` (array of strings)
Evaluation modes to run. Available options:
- `"text_only"`: Text-only math problems
- `"visual"`: Visual math problems with images
- `"scene"`: Scene-based problems (deprecated)
```json
"modes": ["visual", "text_only"]
```

#### `prompt_modes` (array of strings)
Prompt modes for image evaluations. Available options:
- `"implicit"`: Images provided without explicit question
- `"explicit"`: Images provided with specific question
```json
"prompt_modes": ["implicit", "explicit"]
```


#### `num_samples` (integer or null)
Number of samples to evaluate from the dataset. Set to `null` to evaluate all samples.
```json
"num_samples": null
```

#### `metadata_dir` (string)
Directory containing metadata files for data categorization.
```json
"metadata_dir": "data/metadata"
```

### Data Filtering

#### `data_categories` (array of strings, optional)
Specific data categories to evaluate. If not specified, all categories are included.
Available categories:
- `"measurement"`: Measurement-related problems
- `"physical_metric"`: Physical metrics (speed, weight, etc.)
- `"ratio_percentage"`: Ratios and percentages
- `"signboard_and_icon"`: Signboards and icons
- `"temporal"`: Time-related problems
- `"other"`: Other categories
```json
"data_categories": [
  "measurement",
  "physical_metric",
  "ratio_percentage",
  "signboard_and_icon",
  "temporal",
  "other"
]
```

#### `data_subcategories` (array of strings, optional)
Specific subcategories to evaluate. Must be valid subcategories within the enabled categories.
```json
"data_subcategories": ["distance", "length_area_volume", "speed", "weight"]
```

### Model Configuration

#### `evaluation_type` (string)
Type of models to evaluate. Options: `"api"`, `"local"`, `"vllm"`
```json
"evaluation_type": "vllm"
```

#### `api_models` (object, optional)
Configuration for API-based models (OpenAI, Claude, Gemini, etc.)
```json
"api_models": {
  "gpt-4o": {
    "enabled": false,
    "concurrency": 5,
    "description": "OpenAI GPT-4o model"
  }
}
```

#### `local_models` (object, optional)
Configuration for local models (InternVL, LLaVA, etc.)
```json
"local_models": {
  "internvl-chat": {
    "enabled": true,
    "concurrency": 1,
    "description": "Local InternVL model"
  }
}
```

#### `vllm_models` (object, required for vLLM evaluation)
Configuration for vLLM-served models. Each model has the following properties:

- **`enabled`** (boolean): Whether to include this model in evaluation
- **`concurrency`** (integer): Maximum concurrent requests (based on hardware)
- **`api_base`** (string): vLLM server endpoint URL
- **`description`** (string): Human-readable model description

```json
"vllm_models": {
  "llama-4-17b-128e-instruct": {
    "enabled": true,
    "concurrency": 3,
    "api_base": "http://localhost:8000/v1",
    "model_id": "model local path",
    "description": "Meta Llama-4 Maverick 17B-128E Instruct via vLLM"
  }
}
```





