## Prepare the Environment
```bash
conda create -n acpo python=3.10
conda activate acpo
pip install -e ./verl
pip install -e .
```

## SFT for Cold Start

```
bash ./scripts/train/sft.sh
```

## Prepare Data for ACPO Training
```python
# Output parquet files in data/*.parquet.
python scripts/data/acpo_dataset.py
```

## Training
First set data paths, then run
```
bash ./scripts/train/train.sh
```

## Evaluation
First set model paths, then run
```
bash ./scripts/eval/eval_model.sh
```