Our code includes the following main components:
- src/
- verl/examples/grpo_trainer/run_gsm8k_lora.sh
- verl/verl/workers/fsdp_workers.py
- verl/verl/workers/reward_manager/cer_manager.py
- verl/verl/trainer/main_ppo.py
- verl/verl/trainer/ppo/ray_trainer.py
- verl/verl/workers/actor/dp_actor.py

All other code comes from the **VERL** framework.  
Any content, implementation details, or information contained in the VERL framework code is **not related to the authors** of this project.


# Environment


Follow the VERL official instruction.

## Pre-requisites

We need to install the following pre-requisites:

CUDA: Version >= 12.4

cuDNN: Version >= 9.8.0

Apex

CUDA above 12.4 is recommended to use as the docker image, please refer to NVIDIA’s official website for other version of CUDA.
```
# change directory to anywher you like, in verl source code directory is not recommended

wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cuda-toolkit-12-4
update-alternatives --set cuda /usr/local/cuda-12.4
```

cuDNN can be installed via the following command, please refer to NVIDIA’s official website for other version of cuDNN.
```
# change directory to anywher you like, in verl source code directory is not recommended

wget https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cudnn-cuda-12
```

NVIDIA Apex is required for Megatron-LM and FSDP training. You can install it via the following command, but notice that this steps can take a very long time. It is recommended to set the MAX_JOBS environment variable to accelerate the installation process, but do not set it too large, otherwise the memory will be overloaded and your machines may hang.

```
# change directory to anywher you like, in verl source code directory is not recommended
git clone https://github.com/NVIDIA/apex.git && \
cd apex && \
MAX_JOB=32 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
```

## Install dependencies

1. First of all, to manage environment, we recommend using conda:
```
conda create -n verl python==3.10
conda activate verl
```
2. Then, execute the install.sh script that we provided in verl:
```
# Make sure you have activated verl conda env
# If you need to run with megatron
bash scripts/install_vllm_sglang_mcore.sh
# Or if you simply need to run with FSDP
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
```

If you encounter errors in this step, please check the script and manually follow the steps in the script.

## Install verl
For installing the latest version of verl, the best way is to clone and install it from source. Then you can modify our code to customize your own post-training jobs.
```
git clone official verl repo
cd verl
pip install --no-deps -e .
```


# Run our shell


```
bash verl/examples/grpo_trainer/run_gsm8k_lora.sh
```