
### Setup
We provide the environment for evaluation, tested on Debian 6.1.106-3 with A6000 GPUs:
```bash
conda create -n VAR-MATH python==3.9.0
conda activate VAR-MATH
cd latex2sympy
pip install -e . 
cd ..
pip install -r requirements.txt
```

### Generate the VAR-Math data
```bash
python csv2json.py
```

We provide the original csv file for editing the questions.
We also provide the csv with suffix "_debug" for users to check the new generated questions.

### Evaluate
```bash
sh eval_local_7b.sh # for 7B-parameter models
sh eval_local_32b.sh # for 32B-parameter models
sh eval_api.sh # You should fill the api key and the base url, and run for 4 trials.
```

### Collect the results
```bash
python score_agg4.py # for Large-scale models
python VAR_score_pass_16/score_analysis.py # for 7B- and 32B-parameter models
```