
## Code Accompanying "A Risk-Sensitive Policy Gradient Method"

----------------------

----------------------

### Building & Installing

 To install this code and get it running, first set up a python virtual environment. We developed and tested this using 
 a conda environment and python 3.6, but it is likely that one could replicate the results using other configurations.  Create the 
 environment and then do the following:
 
1. Clone this repository.

2.  This code base uses PyTorch.  To ensure that you get the correct version, go to their 
[installation page](https://pytorch.org/get-started/locally/) and enter your system requirements.


3. If you want to run Safety Gym experiments, clone their repository and place it in the `./envs` folder.
    - Running Safety Gym requires MuJoCo, for which you will need a license.  Our key is included in the `./mujoco` folder
      here; please don't share more widely.  Further instructions for downloading and installing MuJoco can be found at
      [their website](http://www.mujoco.org).
    - Install Safety Gym by going to `./envs/safety-gym` and typing `pip install -e .`.
 
4. The parallelization in this code base relies on mpi.  We found that the conda-forge interface works well;
to get it type `conda install -c conda-forge mpi4py`.

5. Install risk and its subpackages.  From this home folder, type `pip install -e .`.

----------------------

### Repository Notes

- As mentioned above, the parallelization of the code has been 
migrated to MPI and we're trying to maintain a more standard architecture for the learning code.  
  
- The `./risk/common` folder has pieces useful to multiple packages and algorithms, while the algorithms in the other packages are more specific.

- The file `./risk/rl/policy_optimizer.py` is a general on-policy learner.  To configure it to act as PPO in the style of Spinning Up,
please see `configs/er_0.json`, for example.
  
- The CDF-based RL code (`./risk/cdf_rl/cdf_policy_optimizer.py`) inherits from `./risk/rl/policy_optimizer.py`.

----------------------

### Training and Testing


Running `./risk/rl/policy_optimizer` and `./risk/cdf_rl/cdf_policy_optimizer` code is straightforward.  For instance, 
one could navigate to the `./risk/cdf_rl` folder and run training and testing using

`mpiexec -n 8 python ppo.py --config ../../configs/CarButton2/w_c_0.json`

and

`mpiexec -n 8 python ppo.py --config ../../configs/CarButton2/w_c_0.json --mode test`

respectively.  Note that the default mode is "train" in all learning code.  The default behavior of 
`rl/policy_optimizer` and `cdf_rl/cdf_policy_optimizer` is to start a new training run from scratch; one can change
this to pick up where a training run left off by changing the "use_prior_nets" field in the config.

In general, the configuration files should go in the `./configs` folder.
The TensorBoard log files should go in the `./logs` folder. Model files (.pt format) and test results (.pkl format)
should go in the `./output` folder.  Shell scripts for running the cdf_rl code on abyss can be found in `./runs`.  Plotting code can be found in `./risk/plotting`.

To view the progress of your runs, go to `./logs` and, in your python environment, call
TensorBoard.  This can be done with a command like `tensorboard --logdir=. --port=7300` (or whatever you want the port 
to be; the default is 6006).  Locally (on a Mac) you can then view your TensorBoard at 
[https://localhost:7300](https://localhost:7300) (or whatever your port number is).  If the localhost prefix doesn't 
work, try replacing it with your IP address.
----------------------
