# Safety-starter-agents model free baselines
Model free baselines used to obtain the results in the "Benchmarking Safe Exploration" paper, 
as well as experimental implementations of SAC and SAC-Lagrangian not used in the paper: 
    
- PPO
- TRPO
- PPO-Lagrangian
- TRPO-Lagrangian
- CPO 

## Python version and installation
The safety starter agents repository used a old version of tensorflow (1.13.1) which cannot be installed
on python version > 3.6.
It also used mpi for parallelization which needs specific system libs (mpi, libmpich).

### Create environments and get the right python version

    conda create --name safe_starter_agents
    conda activate safe_starter_agents
    conda install python=3.6.2

### Installing safety-starter-agents

    pip install -e .
### Installation step to used with the benchmarks environments in rl_simulator
You need to install the rl_simulator package so to register the environments in gym 

    # In rl_simulator
    pip install -e .

### Install safety_gym
Clone [Safety_gym](https://github.com/openai/safety-gym) and install it locally.

## Running baseline for a specific env
Example to run cpo on pendulum_safe-v0 and saving results in current dir

    cd safety-starter-agents/safe_rl/pg
    python run_agent.py --agent cpo --env pendulum_safe-v0 --output_dir .

## Log outputs
To be consistent with ramp structure results are saved per epoch in each folder under 
**{output_dir}**/experiments/**{env}**/**{agent}**/**{seed}**


For now only **reward** and **cost** are saved per epoch.
