# Reconnaissance for Reinforcement Learning with Safety Constraints    

For FrozenLake env, please see `frozenlake.ipynb` .

For high-dimensional environments, see the followings.
All results are presented in the paper.

## Movies
We recommend that you watch the short videos in `movies/` and observe how the agent of each method acts in the Circuit and Jam environment.

## Prerequisites for the experiment
This module depends on Chainer(5.1.0), ChainerRL(0.6.0) and some others.<br>
Please see the `requirements.txt` and `Dockerfile` for detailed information. You can also use Dockerfile for running high-dimensional experiments. Note that we didn't use this Dockerfile in our experiment, and 
we created it after the experiment for reproduction.   
We use [CPO](https://github.com/jachiam/cpo) and [rllab](https://rllab.readthedocs.io/en/latest/index.html) for CPO related experiments. Some lines are modified for the experiment.

For PointGather, we additionally need to [MuJoCo](http://www.mujoco.org/). 
After the installation, execute `rllab/scripts/setup_mujoco.sh` and follow the instructions. If something doesn't work, please see the documentation of MuJoCo and rllab.

Our experiments don't deeply depend on GPUs. We use cuda==9.0, cudnn==7.1.2, cupy==5.2.0 for GPU environment. They don't be included in `requirements.txt`.

## How to view demos
If you'd like to just watch the agents' behavior, please refer to the movies mentioned above.

Suppose you are in the top directory of this repository (where this README is) and you don't use a Docker container. If you use Docker, you may need some additional effort for visualization by matplotlib.   

For Circuit,
```
# Double DQN agent
$ python3 circuit/run.py --normal --demo --load sample-agents/circuit-dqn-lmd0/  

# RP agent
$ python3 circuit/run.py --demo --load sample-agents/circuit-rp/  

# RP agent before learning
$ python3 circuit/run.py --demo                                   
```

For Jam,
```
# Double DQN agent
$ python3 jam/run.py --normal --demo --load sample-agents/jam-dqn-lmd5/

# RP agent
$ python3 jam/run.py --demo --load sample-agents/jam-rp/

# RP agent before learning
$ python3 jam/run.py --demo

# RP agent with 15 cars
$ python3 jam/run.py --demo --load sample-agents/jam-rp/ -n 15
```

## How to train threat models
First, we need to compile a simulator and collect training data for threat function from it.
```
$ ./compile.sh
$ python3 circuit/sampler.py
```
Once you call `./compile.sh`, all simulators written in C++ are compiled.   
Collected data will be dumped in `data/` directory. Now we are ready to learn threat function as follows. 
```
$ python3 circuit/threat_trainer.py
```
Learned model will be saved as `circuit/threat.model`. Note that this script will overwrite the attatched pretrained model.

For Jam task, we can learn threat function in a similar way:
```
$ python3 jam/wall_sampler.py
$ python3 jam/wall_trainer.py
$ python3 jam/car_sampler.py
$ python3 jam/car_trainer.py
```

## How to train reward-seeking policy
To train the reward-seeking policy,
```
$ python3 circuit/run.py                                   # for RP agent in Circuit
$ python3 circuit/run.py --normal --lmd (lambda_value)     # for DoubleDQN agent in Circuit
$ python3 jam/run.py                                       # for RP agent in Jam
$ python3 jam/run.py --normal --lmd (lambda_value)         # for DoubleDQN agent in Jam
```

To evaluate the crash rate of greedy agents, we used this command
```
$ (the same command as above) --eval (directory produced in the training process)
```

## How to run MPC agent
```
$ python3 circuit/model_control.py
$ python3 jam/model_control.py
```

## How to train CPO agent
```
$ python3 cpo/experiments/CPO_circuit.py
$ python3 cpo/experiments/CPO_jam.py
```

## PointGather
After setting up MuJoCo, you can sample the data, learn the threat model, and train reward-seeking policy as other environments with the scripts in `cpo/experiments/`. You can change the safety threshold for RP agent with `--limit` option.
For example,
```
$ python3 cpo/experiments/DQN_point_gather.py --limit 0.1
```

***
Submitted to 9th International Conference on Learning Representations (ICLR 2021). Do not distribute.
