## Training Instructions

This repository implements a two-stage training pipeline: Behavioral Cloning (BC) followed by Reinforcement Learning (LHRL-VGR).

### Step 1: Behavioral Cloning (BC) Training

1. **Setup Environment**
   ```bash
   git clone https://github.com/hiyouga/LLaMA-Factory.git
   cd LLaMA-Factory
   pip install -r requirements.txt
   ```

2. **Run BC Training**
   ```bash
   python src/train_bash.py \
     --model_name_or_path /path/to/your/base_model \
     --dataset_dir /path/to/your/data \
     --train_file data/BC/train.jsonl \
     --stage sft \
     --template default \
     --output_dir outputs/bc_model
   ```

### Step 2: LHRL-VGR Training

1. **Install Dependencies**
   ```bash
   pip install -r requirements_torch260_vllm.txt
   ```

2. **Configure Model Path**  
   Edit `LHRL-VGR/scripts/rlvr_config_mixed.yaml`:
   ```yaml
   pretrain: /path/to/your/trained_bc_model
   ```

3. **Run Reinforcement Learning**
   ```bash
   cd LHRL-VGR/scripts
   bash run_pipeline.sh
   ```


## Important Notes

- **Initial Version Notice**: This is an initial release of the code. A fully refined and documented version will be published subsequently.
- **Data Requirements**: 
  - BC requires `data/BC/train.jsonl` for BC training
  - LHRL-VGR requires reinforcement learning data `data/LHRL/sotopia_2_stage.jsonl`
- **Hardware Recommendations**: 
  - 8×NVIDIA A100-80GB GPUs recommended
  - Adjust batch sizes in configuration files for smaller GPU memory

For detailed parameter configurations, see:
- BC training: LLaMA-Factory documentation
- RL training: `LHRL-VGR/scripts/rlvr_config_mixed.yaml`
