# Bridging the Imitation Gap by Adaptive Insubordination


## Table of contents

1. [Installation](#installation)
1. [Generating plots from included TSV files](#generating-hyperparameter-plots)
1. [Generating TSV files](#generating-tsvs)
1. [Poisoned Doors and 2D-Lighthouse](#poisoned-doors-and-2d-lighthouse)
1. [Additional information](#additional-information)


## Installation

Begin by cloning this repository to your local machine and moving into the top-level directory


This library has been tested **only in python 3.6**, the following assumes you have a working
version of **python 3.6** installed locally. In order to install requirements we recommend
using [`pipenv`](https://pipenv.kennethreitz.org/en/latest/) but also include instructions if
you would prefer to install things directly using `pip`.

### Installing requirements with `pipenv` (*recommended*)

If you have already installed [`pipenv`](https://pipenv.kennethreitz.org/en/latest/), you may
run the following to install all requirements.

```bash
pipenv install --skip-lock --dev
```

This should an automatically fix any dependencies and give an output like:
```bash
Creating a virtualenv for this project…
Pipfile: /Users/USERNAME/neurips-20-advisor/Pipfile
Using /Users/USERNAME/anaconda3/bin/python3 (3.6.4) to create virtualenv…
⠙ Creating virtual environment...Using base prefix '/Users/USERNAME/anaconda3'
New python executable in /Users/USERNAME/.local/share/virtualenvs/neurips-20-advisor-I8NdWLLl/bin/python3
Also creating executable in /Users/USERNAME/.local/share/virtualenvs/neurips-20-advisor-I8NdWLLl/bin/python
Installing setuptools, pip, wheel...done.
Running virtualenv with interpreter /Users/USERNAME/anaconda3/bin/python3

✔ Successfully created virtual environment! 
Virtualenv location: /Users/USERNAME/.local/share/virtualenvs/neurips-20-advisor-I8NdWLLl
Installing dependencies from Pipfile…
An error occurred while installing matplotlib! Will try again.
An error occurred while installing torchvision~=0.5.0! Will try again.
An error occurred while installing moviepy! Will try again.
An error occurred while installing pandas! Will try again.
An error occurred while installing seaborn! Will try again.
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 20/20 — 00:02:03
Installing initially failed dependencies…
  ☤  ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 5/5 — 00:00:17
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
```
### Installing requirements with `pip`

Note: *do not* run the following if you have already installed requirements with `pipenv`
as above. If you prefer using `pip`, you may install all requirements as follows

```bash
pip install -r requirements.txt
```

Depending on your machine configuration, you may need to use `pip3` instead of `pip` in the
above.

## Generating Hyperparameter Plots from TSVs
We have included 8 tsv files in `experiment_output/minigrid_random_hp_runs/` corresponding to the 8 MiniGrid tasks in Fig. 7, 8 of the supplement paper. Before release, we will make the names consistent with the paper. Each tsv file consists of 50 hyperparameterized models trained for each of the 13 baselines included in this work. 

Using the following command the user can generate the plots for the task `WallCrossingCorruptExpertS25N10` i.e. `WC Corrupt (S25, N10)` of the submission (runtime: < 1min).

```bash
pipenv run python extensions/rl_minigrid/minigrid_scripts/summarize_random_hp_search.py --env_name WallCrossingCorruptExpertS25N10
```

This would generate an stdout like:

```bash
random_hp_search_minigrid_runs_WallCrossingCorruptExpertS25N10.tsv
{'bc': 50, 'dagger': 50, 'bc_teacher_forcing': 50, 'ppo': 50, 'bc_then_ppo': 50, 'dagger_then_ppo': 50, 'bc_teacher_forcing_then_ppo': 50, 'advisor_fixed_alpha_different_heads': 50, 'dagger_then_advisor_fixed_alpha_different_head_weights': 50, 'bc_teacher_forcing_then_advisor_fixed_alpha_different_head_weights': 50, 'pure_offpolicy': 50, 'ppo_with_offpolicy': 50, 'ppo_with_offpolicy_advisor_fixed_alpha_different_heads': 50}
650
{'bc': 50, 'dagger': 50, 'bc_teacher_forcing': 50, 'ppo': 50, 'bc_then_ppo': 50, 'dagger_then_ppo': 50, 'bc_teacher_forcing_then_ppo': 50, 'advisor_fixed_alpha_different_heads': 50, 'dagger_then_advisor_fixed_alpha_different_head_weights': 50, 'bc_teacher_forcing_then_advisor_fixed_alpha_different_head_weights': 50, 'pure_offpolicy': 50, 'ppo_with_offpolicy': 50, 'ppo_with_offpolicy_advisor_fixed_alpha_different_heads': 50}
650
```
indicating each of the 13 baselines and the number of models for which hyperparameters were searched.

By simply omitting the `--env_name` flag i.e. using the following command, all the eight tasks would be processed (runtime: < 5min):

```bash
pipenv run python extensions/rl_minigrid/minigrid_scripts/summarize_random_hp_search.py
```

The plots (.pdfs) are saved under:
`experiment_output/minigrid_random_hp_runs/plots/`

We have already included these pdfs, and the above commands would regenerate and overwrite them.

## Generating TSVs
We have included the scripts for generating the tsv files included in this code submission. These scripts take 18-24 hours on a 48 CPU, 4 NVIDIA T4 GPU machine (`g4dn.12xlarge` AWS instance). The `extensions/rl_minigrid/minigrid_scripts/minigrid_random_hp_search.py` script achieves this. For `WallCrossingS25N10` task, this can be done with the following command (runtime: < 24 hours on `g4dn.12xlarge`):
```bash
pipenv run python extensions/rl_minigrid/minigrid_scripts/minigrid_random_hp_search.py --experiment_base extensions/rl_minigrid/minigrid_experiments/key_corridor/ --single_process_training --output_dir experiment_output/minigrid_random_hp_runs --env_name WallCrossingS25N10 --disable_logging
```
As mentioned before, the above command runs 50 models for all baselines including those based solely on expert demonstrations. For these to function, we need to save demonstrations for each of the tasks. The `minigrid_scripts/save_expert_demos.py` script handles this. For eg., to save demonstrations for the `WallCrossingS25N10` task, use the following (runtime: < 15 mins on `g4dn.12xlarge`):
```bash
pipenv run python minigrid_scripts/save_expert_demos.py --experiment bc --experiment_base minigrid_experiments/key_corridor --output_dir minigrid_data/minigrid_demos --gp "minigrid_env_name.name = 'WallCrossingS25N10'"
```

## Poisoned Doors and 2D-Lighthouse
All the above experiments can be repeated for Poisoned Doors. Corresponding folder `extensions/rl_poisoneddoors` includes similar scripts `poisoneddoors_random_hp_search.py` and `summarize_random_hp_search.py`.
Particularly, similar to MiniGrid tasks, one can generate plots using the following command (runtime: < 1 min):
```bash
pipenv run python extensions/rl_poisoneddoors/poisoneddoors_scripts/summarize_random_hp_search.py
``` 
The plots (.pdfs) are saved under:
`experiment_output/poisoneddoors_random_hp_runs/plots/` 
We have already included these pdfs, and the above commands would regenerate and overwrite them.


For 2D-Lighthouse, we have included all the code. Before release, we will include additional details to run the code in `extensions/rl_lighthouse`.

## Additional information
### Contributions

If
 you would like to make such a contributions we recommend first submitting an 
 issue describing your proposed improvement.
 Doing so can ensure we can validate your suggestions before you spend a great deal of time
 upon them. Small (or validated) improvements and bug fixes should be made via a pull request
 from your fork of this repository.
 
 
### References and anonymity
This work builds upon
the open-sourced [pytorch-a2c-ppo-acktr](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail) 
library of Ilya Kostrikov and uses some data structures from FAIR's open-sourced
[habitat-api](https://github.com/facebookresearch/habitat-api). These have been referenced in the code, and clear disclaimers to maintain anonymity have also been included.