# State-wise Constrained Policy Optimization 


## Environment Installation

```
conda create --name safebench --file requirements.txt
```
Note that if error reports while installing conda environment. You may choose to first comment the unsupported package, then use `pip install` to install those packages manually. 

## Simulator Installation

Install [mujoco_py](https://github.com/openai/mujoco-py), see the mujoco_py documentation for details. Note that mujoco_py **requires Python 3.6 or greater**.
Due to the recent update of Cython, please update cython version after installation of mujoco-py, or error will be reported when importing mujoco.
```
pip install "cython<3"
```

Afterwards, simply install Safety Gym Arm by:

```
cd safety-gym-arm

pip install -e .
```


## Policy Training
Take SCPO training for example:
```
cd train/scpo

conda activate safebench

python scpo.py --task goal8_noconti --seed 1
```

## SCPO Policy Video Production
After training finished:
```
python scpo_video.py --model_path logs/<scpo log>/<scpo log specific seed>/pyt_save/model.pt --task <experiment name> --video_name <video name> --max_epoch <max epoch>            
```

## Plot the Training Curve 
```
cd train
mkdir comparison
(copy the log you want to visualize into the comparison/ folder)
python utils/plot.py comparison/ --title test --reward --cost
```