# Guarded Policy Optimization with Imperfect Online Demonstrations

# Create virtual environment
conda create -n ts2c python=3.7
conda activate ts2c

# Install basic dependency
pip install -e .

# Training TS2C
1. Change directory to training script.
```bash
cd ts2c/training_script/
```
2. Train the teacher policy.
```bash
python train_ppo_baseline.py
python train_sac_baseline.py
```
3. Use the teacher policy checkpoint and `egpo_utils/parse_expert.py` to generate policy weight file.
4. Train with teacher-student shared control.
```
python train_egpo.py --exp-name ts2c --local-dir ../results --expert-policy-type ppo --expert-level 30 --value-takeover --value-from-scratch --ensemble --egpo-ensemble --start-seed 00 --warmup-ts 50000 --ckpt-freq 10 --no-cql --warmup-noise 0.3 --num-gpus 1
```
