# Safe Set Guided State-wise Constrained Policy Optimization

## Installation
Install [mujoco_py](https://github.com/openai/mujoco-py), see the mujoco_py documentation for details. Note that mujoco_py **requires Python 3.6 or greater**.

Install environment:
```
conda create --name s3po --file requirements.txt
```

Note that if error reports while installing conda environment. You may choose to first comment the unsupported package, then use `pip install` to install those packages manually. 
```
conda create --name s3po 

conda activate s3po

pip install -r requirements.txt
```

Due to the recent update of Cython, please update cython version after installation of mujoco-py, or error will be reported when importing mujoco.
```
pip install "cython<3"
```

Afterwards, simply install Safety Gym Arm by:

```
cd safety-gym-arm

pip install -e .
```

## Policy Training
Take S3PO training for example:
```
cd train/s3po

conda activate s3po

python s3po.py --task goal1_noconti --seed 1
```

## S3PO Policy Video Production
After training finished:
```
python s3po_video.py --model_path logs/<s3po log>/<s3po log specific seed>/pyt_save/model.pt --task <experiment name> --video_name <video name> --max_epoch <max epoch>            
```

## Plot the Training Curve 
```
cd train
mkdir comparison
(copy the log you want to visualize into the comparison/ folder)
python utils/plot.py comparison/ --title test --reward --cost
```