# Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

This repository contains the code for reproducing the gridworld experiments in our anonymous submission titled "Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning". 

[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)

## Usage 
To install all dependencies with Anaconda run `conda env create -f environment.yml` and use `source activate cap-planet` to activate the environment. 

To replicate experiments on CarRacing, you can run

**CAP**
```
python main.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--cost-constrained --penalize-uncertainty \
--learn-kappa --penalty-kappa 0.1 \
--id CarRacing-cap --seed 1
```

**CAP with fixed kappa**
```
python main.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--cost-constrained --penalize-uncertainty \
--penalty-kappa 1.0 \
--id CarRacing-kappa1 --seed 1
```

**CCEM**
```
python main.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--cost-constrained \
--id CarRacing-ccem --seed 1
```

**CEM**
```
python main.py --env CarRacingSkiddingConstrained-v0 \
--cost-limit 0 --binary-cost \
--id CarRacing-cem --seed 1
```

## Acknowledgement

This repository contains code from the following repositories:
[PlaNet](https://github.com/Kaixhin/PlaNet)

We thank the
authors and contributors for open-sourcing their code.

## References

[1] [Learning Latent Dynamics for Planning from Pixels](https://arxiv.org/abs/1811.04551)
