## Train
```
cd verl-0.4.1
bash examples/ppo_trainer/run_sapo.sh
```

## Main code
```
verl/trainer/ppo/core_algos.py:
    Line 199 for compute_gae_advantage_return
    Line 883 for compute_policy_loss
    Line 1172 for compute_value_loss
```