Environment installation can be directly done using conda and environment.yml file.


Running Our Code:
1. Preference Data Generation
    i. To generate 26k paired data from step-annotated PRM800k - python preference_data,py --curriculum
    ii. To generate and augment 220k paired data from step-annotated PRM800k - python preference_data_with_aug,py --curriculum

Note: Adding flat curriculum automatically divides the data for different curriculum otherwise all data will be together.

2. Training using the generated Data.
bash CL_train.sh has provided the script to run the training.



Evaluation Benchmark: 
1. PRMBench (We have provided the code). Instruction to run are on their github page. - https://github.com/ssmisya/PRMBench
2. ThinkPRM (We have provided the code). Instruction to run are on their github page. - https://github.com/mukhal/thinkprm