This codebase is heavily based on verl (https://github.com/volcengine/verl), while the main changes are in the following scripts:

- ./verl/trainer/ppo/ray_trainer.py
- ./verl/utils/reward_score/__init__.py
- ./NuRL/*