## Training

```
python grpo.py --model Qwen/Qwen2.5-3B-Instruct --dataset datasets/{dataset} --output_dir checkpoints/{model_name}
```

## Evaluation 

### GSM combined in several subproblem problems (greedy by default)

```
python gsm_eval.py --models {model_1} {model_2} ... --datasets {dataset_1} {dataset_2} ...
```

### MATH500 (greedy by default)

```
python math500_eval.py --models {model_1} {model_2} ...
```

## Datasets

### Combining GSM questions into problems with subproblems:

`combine_questions_into_long_horizon_resoning.py`

### Combining several datasets together (i.e. for the curriculum):

`combine_datasets.py`
