## Supervised Fine-Tuning (SFT)

To reproduce the initial SFT policies in the paper, use the `launch.sh` script for training the different sized policies.