Our code is directly modified from VeRL codebase (DAPO Implementation 01ef7184821d0d7844796ec0ced17665c1f50673)

The current codebase is only for reviewing purpose, and is largely unsorted. We are cleaning up code and planning for public release in the near future.

Core algorithm implementation is in `verl/workers/actor/dp_actor.py`

Experiment running requiring a slurm cluster. Reference script is in `7B_running.sh` and `32B_running.sh`