# VisioMath Benchmark Evaluation

## Project Overview
This project provides a benchmark evaluation system for mathematical question answering with image processing capabilities. It processes JSON-formatted test data, evaluates model responses against ground truth answers, and calculates accuracy metrics.
we will release all the benchmark upon paper acceptance

## Core Modules
1. `function.py` - Core functionality:
   - Strategy 1 and strategy 2
   - Answer extraction and evaluation
   - JSON data handling
   - Accuracy calculation

2. `test.py` - Evaluation framework:
   - Main execution flow
   - Batch processing with periodic saving
   - Result aggregation
   
3. `cot_data_gen.py` - Data Generation:
   - Generate cot data
   - Shuffing option prompt
## Usage Guide
1. Configure API keys in `function.py`
2. Run evaluation:
```bash
python test.py
```
3. Results will be saved to `baseline.json`

## Accuracy Metrics
- Overall accuracy
- Accuracy by image count (4-image vs 5+-image questions)

## Notes
- Ensure proper API keys are configured
