## 🔧How to use
**Step1** Create conda environment and Install other dependencies.
1. Create BC conda environment (LLaMA Factory).
```shell
conda create --name BC python=3.11 -y
conda activate BC
cd BC 
pip install -e ".[torch,metrics]"
```
2. Create RL conda environment (verl).
```shell
# RL environment (verl)
conda create --name RL python=3.11 -y
conda activate RL
cd RL
pip3 install -e .[vllm]
pip install -r requirements.txt
```

**Step2** Preparing the Model API

1. (**Must**) Set up your OPENAI key in config/gpt_4o.yaml (Evaluation)
```shell
api_key: "Your OPENAI key"
api_url: "API URL"
```

2. (**Must**) Set up your key in config/qwen2.5_72b_instruct.yaml (Reward Model)
```shell
api_key: "Your key"
api_url: "API URL"
# We also recommend using vLLM. And we use HTTP server that implements OpenAI’s Completions and Chat API.
# Set up your vLLM settings in config/*.yaml
```
**Step3** Behavior Cloning Training
```shell
conda activate BC
cd BC
## (Must) Firstly set the bc_training_data_path in ./BC/data/dataset_info.yaml
sh train.sh
```

**Step4** RL Training
```shell
conda activate RL
cd RL
## (Must) Firstly, translate the rl training data into ".parquet" format by using the script in ./RL/example/data_preprocess/sotopia.py
sh sotopia_ampo_llama3.1_8b.sh
sh sotopia_ampo_qwen2.5_7b.sh
```

**Step5** Evaluation and Inference
```shell
conda activate RL
cd RL
sh infer.sh
## show result
python result.py --env sotopia --data_path your_result_path
```