# Installation
1. This code base requires `Python 3.7` or higher. All package requirements are in
`requirements.txt`. To install from scratch using [Anaconda](download minconda), use the following
commands.

```
conda create -n [your_env_name] python=3.7
source activate [your_env_name]
pip install -r requirements.txt
cd rl-toolkit && pip install -e . && cd -
```

2. Create a `tmp` directory in your home directory to store logs from environments, or change the parameter `--env-log-dir`.

3. Download expert demonstration datasets. We include the expert demonstration datasets on Google Drive.

[Link to Expert Demonstration Datasets](https://drive.google.com/drive/folders/1Z9N7fTYb3uul-lgTC_zlJrMaDAkYYJdR?usp=sharing). Download all files and place them in `./expert_datasets` relative to the root of this project.

# Commands

By default the commands are for the demonstrations covering 100% of the states.
The additional arguments needed for running the experiments with expert
demonstrations that cover 
75%, 50%, or 25% of the states are listed below the main commands. These
additional arguments are specified the same for our method and baselines.
For all of the experiments, you can specify the random seed initialization with
`--seed`.

## Navigation

- Ours: `python goal_prox/main.py --prefix dpf --linear-lr-decay True --env-log-dir ~/tmp --env-name "MiniGrid-FourRooms-v0" --alg dpf --traj-batch-size 32 --num-env-steps 1e7 --pf-delta 0.01 --pf-uncert-scale 0.1 --eval-interval -1 --gw-img --frame-stack False --pf-state-norm False --traj-load-path ./expert_datasets/gw_100.pt --action-input False --traj-frac 0.25`
- GAIfO-s: `python goal_prox/main.py --prefix gail --linear-lr-decay True --env-log-dir ~/tmp --env-name "MiniGrid-FourRooms-v0" --alg gail --eval-interval 200 --gw-img --frame-stack False --num-env-steps 1e7 --disc-lr 0.0001 --eval-interval -1 --action-input False --traj-load-path ./expert_datasets/gw_100.pt --traj-frac 0.25`
- GAIfO: `python goal_prox/main.py --prefix gaifo --linear-lr-decay True --env-log-dir ~/tmp --env-name "MiniGrid-FourRooms-v0" --alg gaifo --eval-interval 200 --gw-img --frame-stack False --num-env-steps 1e7 --disc-lr 0.001 --eval-interval -1 --traj-load-path ./expert_datasets/gw_100.pt --traj-frac 0.25`
- BCO: `python goal_prox/main.py --prefix bco --linear-lr-decay True --env-log-dir ~/tmp --env-name "MiniGrid-FourRooms-v0" --alg bco --gw-img --frame-stack False --save-interval -1  --max-grad-norm -1 --lr 0.0001 --bco-expl-steps 10000 --bco-inv-lr 0.0001 --bco-inv-epochs 1 --bco-inv-eval-holdout 0.1 --bco-inv-batch-size 32 --bc-num-epochs 1 --bco-alpha 500 --bco-alpha-size 10000 --traj-batch-size 32 --num-render 0 --num-processes 32 --eval-num-processes 32 --traj-load-path ./expert_datasets/gw_100.pt --traj-frac 0.25`
- BC: `python goal_prox/main.py --prefix bc --linear-lr-decay True --env-log-dir ~/tmp --sync --env-name "MiniGrid-FourRooms-v0" --alg bc --eval-interval -1 --gw-img --frame-stack False --num-env-steps 1e7 --traj-load-path ./expert_datasets/gw_100.pt --traj-frac 0.25  --save-interval -1  --sync --bc-num-epochs 1000 --num-render 0 --eval-interval 2000 --lr 0.0001 --eval-num-processes 32 --num-eval 10  --max-grad-norm -1  --traj-val-ratio 0.2 --traj-batch-size 64`
- GAIL: `python goal_prox/main.py --prefix gail --linear-lr-decay True --env-log-dir ~/tmp --sync --env-name "MiniGrid-FourRooms-v0" --alg gail --eval-interval -1 --gw-img --frame-stack False --num-env-steps 1e7 --disc-lr 0.0001 --eval-interval -1 --traj-load-path ./expert_datasets/gw_100.pt --traj-frac 0.25 --action-input True  --save-interval 200  --sync --gail-reward-norm True`

For running holdout experiments specify the following additional arguments:
- 75%: `--traj-load-path ./expert_datasets/gw_75.pt --gw-cover 0.75 --gw-compl`
- 50%: `--traj-load-path ./expert_datasets/gw_50.pt --gw-cover 0.5 --gw-compl`
- 25%: `--traj-load-path ./expert_datasets/gw_25.pt --gw-cover 0.25 --gw-compl`

## Fetch Push

- Ours: `python goal_prox/main.py --prefix dpf-deep --use-proper-time-limits --linear-lr-decay True --lr 0.001 --num-env-steps 1e7 --alg dpf-deep --env-log-dir ~/tmp --vid-fps 30 --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.01 --env-name FetchPushEnvCustom-v0 --il-in-action-norm --il-out-action-norm --traj-load-path ./expert_datasets/push_100.pt --fetch-easy-obs --pf-delta 0.02 --pf-state-norm False --exp-sample-size 4096 --exp-buff-size 4096 --entropy-coef 0.01 --num-env-steps 5e6 --lr 3e-4 --pf-reward-scale 10.0 --pf-uncert-scale 0.01`
- GAIfO-s: `python goal_prox/main.py --prefix gail-deep --use-proper-time-limits --linear-lr-decay True --lr 3e-4 --num-env-steps 1e7 --alg gail-deep --env-log-dir ~/tmp --vid-fps 30 --disc-lr 0.0001 --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.01 --env-name FetchPushEnvCustom-v0 --action-input False --traj-load-path ./expert_datasets/push_100.pt --fetch-easy-obs`
- GAIfO: `python goal_prox/main.py --prefix gaifo-deep --use-proper-time-limits --linear-lr-decay True --lr 3e-4 --num-env-steps 1e7 --alg gaifo-deep --env-log-dir ~/tmp --vid-fps 30 --disc-lr 0.001 --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.01 --env-name FetchPushEnvCustom-v0 --action-input False --traj-load-path ./expert_datasets/push_100.pt --fetch-easy-obs`
- BCO: `python goal_prox/main.py --prefix bco --linear-lr-decay True --alg bco --env-log-dir ~/tmp --save-interval -1 --num-render 0 --vid-fps 30 --normalize-env False --bc-num-epochs 1 --max-grad-norm -1 --bco-expl-steps 10000 --bco-inv-lr 0.0001 --traj-load-path ./expert_datasets/push_100.pt --fetch-easy-obs --fetch-cover 1.0 --env-name FetchPushEnvCustom-v0 --bco-inv-epochs 1 --lr 0.0005 --bco-alpha 500 --bco-alpha-size 10000 --num-processes 32 --eval-num-processes 32`
- BC: `python goal_prox/main.py --prefix bc --use-proper-time-limits --linear-lr-decay True --lr 0.001 --num-env-steps 1e7 --alg bc --env-log-dir ~/tmp --sync --vid-fps 30 --save-interval -1 --eval-interval 2000 --env-name FetchPushEnvCustom-v0  --normalize-env False  --il-in-action-norm --il-out-action-norm --traj-load-path ./expert_datasets/push_100.pt --fetch-easy-obs --fetch-cover 1.0 --bc-num-epochs 1000 --eval-num-processes 20 --num-eval 50 --num-render 0  --traj-val-ratio 0.2`
- GAIL: `python goal_prox/main.py --prefix gail-deep --use-proper-time-limits --linear-lr-decay True --lr 3e-4 --num-env-steps 1e7 --alg gail-deep --env-log-dir ~/tmp --vid-fps 30 --disc-lr 0.0001 --save-interval 50 --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.01 --eval-interval -1 --env-name FetchPushEnvCustom-v0  --normalize-env True --action-input True  --traj-load-path ./expert_datasets/push_100.pt --fetch-easy-obs --fetch-cover 1.0 --gail-reward-norm True --sync --il-in-action-norm --il-out-action-norm --num-env-steps 5e6`

For running holdout experiments specify the following additional arguments:
- 75%: `--traj-load-path ./expert_datasets/push_75.pt`
- 50%: `--traj-load-path ./expert_datasets/push_50.pt`
- 25%: `--traj-load-path ./expert_datasets/push_25.pt`

For running added noise experiments specify the following additional arguments: 
- 1.25x: `--noise-ratio 1.25 --goal-noise-ratio 1.25`
- 1.75: `--noise-ratio 1.75 --goal-noise-ratio 1.75`
- 2.0: `--noise-ratio 2.0 --goal-noise-ratio 2.0`

## Fetch Pick

- Ours: `python goal_prox/main.py --use-proper-time-limits --linear-lr-decay False --num-env-steps 1e7 --env-log-dir ~/tmp --eval-num-processes 1 --vid-fps 30 --pf-delta 0.02 --pf-state-norm False --num-mini-batch 32 --num-epochs 10 --pf-reward-norm False --lr 0.001 --alg dpf-deep --prefix dpf-deep --traj-load-path ./expert_datasets/pick_100.pt --env-name FetchPickAndPlaceDiffHoldout-v0 --exp-succ-scale 1 --fetch-easy-obs --il-in-action-norm --il-out-action-norm --exp-sample-size 4096 --exp-buff-size 4096  --pf-uncert-scale 0.001 --num-env-steps 5e6 --pf-uncert-scale 0.01 --entropy-coef 0.01 --pf-reward-scale 10.0`
- GAIfO-s: `python goal_prox/main.py --prefix gail-deep --use-proper-time-limits --linear-lr-decay True --lr 3e-4 --num-env-steps 1e7 --alg gail-deep --env-log-dir ~/tmp --vid-fps 30 --disc-lr 0.0001 --il-out-action-norm --il-in-action-norm --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.001 --action-input False --traj-load-path ./expert_datasets/pick_100.pt FetchPickAndPlaceDiffHoldout-v0 --fetch-easy-obs`
- GAIfO: `python goal_prox/main.py --prefix gaifo-deep --use-proper-time-limits --linear-lr-decay True --lr 3e-4 --num-env-steps 1e7 --alg gaifo-deep --env-log-dir ~/tmp --vid-fps 30 --disc-lr 0.0001 --il-out-action-norm --il-in-action-norm --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.001 --action-input False --traj-load-path ./expert_datasets/pick_100.pt --env-name FetchPickAndPlaceDiffHoldout-v0 --fetch-easy-obs`
- BCO: `python goal_prox/main.py --prefix bco --linear-lr-decay True --alg bco --env-log-dir ~/tmp --save-interval -1 --num-render 0 --vid-fps 30 --bc-state-norm --max-grad-norm -1 --normalize-env False --lr 0.0005 --bc-num-epochs 1 --bco-expl-steps 10000 --bco-inv-lr 0.0001 --bco-inv-epochs 1 --bco-alpha 500 --bco-alpha-size 10000 --eval-num-processes 32 --traj-load-path ./expert_datasets/pick_100.pt --env-name FetchPickAndPlaceDiffHoldout-v0 --fetch-easy-obs --num-processes 32 --log-interval 1`
- BC: `python goal_prox/main.py --prefix bc --use-proper-time-limits --linear-lr-decay True --num-env-steps 1e7 --alg bc --env-log-dir ~/tmp --vid-fps 30 --save-interval -1 --il-out-action-norm --il-in-action-norm --traj-load-path ./expert_datasets/pick_100.pt --env-name FetchPickAndPlaceDiffHoldout-v0 --fetch-easy-obs --fetch-cover 1.0 --sync --num-render 0  --normalize-env False --lr 0.001 --bc-num-epochs 1000 --eval-interval 2000 --eval-num-processes 20 --num-eval 50 --traj-val-ratio 0.2`
- GAIL: `python goal_prox/main.py --prefix gail-deep --use-proper-time-limits --linear-lr-decay True --lr 3e-4 --num-env-steps 1e7 --alg gail-deep --env-log-dir ~/tmp --vid-fps 30 --disc-lr 0.0001 --save-interval 200 --eval-interval -1 --il-out-action-norm --il-in-action-norm --num-mini-batch 32 --num-epochs 10 --entropy-coef 0.01 --eval-interval -1 --action-input True --traj-load-path ./expert_datasets/pick_100.pt --env-name FetchPickAndPlaceDiffHoldout-v0 --fetch-easy-obs  --fetch-cover 1.0 --gail-reward-norm False --sync`

For running holdout experiments specify the following additional arguments:
- 75%: `--traj-load-path ./expert_datasets/pick_75.pt`
- 50%: `--traj-load-path ./expert_datasets/pick_50.pt`
- 25%: `--traj-load-path ./expert_datasets/pick_25.pt`

For running added noise experiments specify the following additional arguments: 
- 1.25x: `--noise-ratio 1.25 --goal-noise-ratio 1.25`
- 1.75: `--noise-ratio 1.75 --goal-noise-ratio 1.75`
- 2.0: `--noise-ratio 2.0 --goal-noise-ratio 2.0`

## Ant

- Ours: `python goal_prox/main.py --prefix ant-ours --num-env-steps 5e6 --linear-lr-decay True --env-log-dir ~/tmp --eval-num-processes 1 --lr 3e-4 --alg dpf-deep --traj-load-path ./expert_datasets/ant_50.pt --env-name AntGoal-v0 --pf-delta 0.01 --pf-uncert-scale 0.01 --num-steps 500 --pf-reward-scale 50.0 --traj-frac 0.5 --entropy-coef 0.001 --num-mini-batch 32 --num-epochs 10 --il-in-action-norm --il-out-action-norm --prox-lr 0.0001`
- GAIFO-s: `python goal_prox/main.py --prefix ant-gaifo-s --linear-lr-decay True --lr 3e-4 --num-env-steps 5e6 --alg gail-deep --env-log-dir ~/tmp --eval-num-processes 1 --env-name AntGoal-v0 --traj-load-path ./expert_datasets/ant_50.pt --num-steps 500 --traj-frac 0.5 --entropy-coef 0.001 --num-mini-batch 32 --num-epochs 10 --il-in-action-norm --il-out-action-norm`
- GAIFO: `python goal_prox/main.py --prefix ant-gaifo --linear-lr-decay True --lr 3e-4 --num-env-steps 5e6 --alg gaifo-deep --env-log-dir ~/tmp --eval-num-processes 1 --traj-load-path ./expert_datasets/ant_50.pt --env-name AntGoal-v0 --num-steps 500 --traj-frac 0.5 --entropy-coef 0.001 --num-mini-batch 32 --num-epochs 10 --il-in-action-norm --il-out-action-norm`
- BCO: `python goal_prox/main.py --prefix ant-bco --linear-lr-decay True --env-log-dir ~/tmp --env-name AntGoal-v0 --alg bco --save-interval -1 --max-grad-norm -1 --lr 0.0001 --traj-load-path ./expert_datasets/ant_50.pt --bco-expl-steps 10000 --bco-inv-lr 0.0001 --traj-frac 0.5 --bco-inv-epochs 1 --bco-inv-batch-size 32 --bc-num-epochs 1 --bco-alpha 500 --bco-alpha-size 10000 --traj-batch-size 32 --eval-num-processes 32 --num-render 0 --num-processes 32`
- BC: `python goal_prox/main.py --prefix bc --use-proper-time-limits --linear-lr-decay True --lr 0.001 --num-env-steps 1e7 --alg bc --env-log-dir ~/tmp --sync --vid-fps 30 --save-interval -1 --eval-interval 2000 --env-name AntGoal-v0 --normalize-env False  --il-in-action-norm --il-out-action-norm --traj-load-path ./expert_datasets/ant_50.pt --bc-num-epochs 1000 --eval-num-processes 20 --num-eval 50 --num-render 0 --ant-noise 0.0 --traj-frac 0.5 --traj-val-ratio 0.2`
- GAIL `python goal_prox/main.py --prefix ant-gaifo-s-0 --linear-lr-decay True --lr 3e-4 --num-env-steps 5e6 --alg gail-deep --env-log-dir ~/tmp --eval-num-processes 1 --env-name AntGoal-v0 --traj-load-path ./expert_datasets/ant_50.pt --cuda False --render-metric --num-steps 500 --traj-frac 0.5 --ant-noise 0.0  --action-input True --num-epochs 10 --num-mini-batch 32 --use-proper-time-limits --il-in-action-norm --il-out-action-norm --eval-interval -1 --save-interval 200  --sync  --gail-reward-norm True  --entropy-coef 0.001 --disc-lr 0.0001`


For running generalization experiments specify the following additional arguments: 
- 0.00 noise: `--ant-noise 0.0`
- 0.01 noise: `--ant-noise 0.01`
- 0.03 noise: `--ant-noise 0.03`
- 0.05 noise: `--ant-noise 0.05`

# Code Structure
- `goal_prox`: method and custom environment code.
  - `goal_prox/method/prox_func.py`: code for our method.
  - `goal_prox/envs/ant.py`: ant locomotion task.
  - `goal_prox/envs/fetch/custom_fetch`: Fetch Pick task.
  - `goal_prox/envs/fetch/custom_push`: Fetch Push task.
  - `goal_prox/envs/fetch/fetch_pickplace_dems`: script to generate Fetch Pick
    demonstrations.
  - `goal_prox/envs/fetch/fetch_push_dems`: script to generate Fetch Push
    demonstrations.
  - `goal_prox/gym_minigrid`: MiniGrid code for navigation environment from
    [maximecb](https://github.com/maximecb/gym-minigrid).
- `rlf`: base RL code and code for imitation learning baselines.
  - `rlf/algos/on_policy/ppo.py`: the PPO policy updater code we use for RL.
  - `rlf/algos/il/bco.py`: the Behavioral Cloning from Observation baseline
    code.
  - `rlf/algos/il/gaifo.py`: the Generative Adversarial Imitation Learning from
    Observations baseline code. Extends the code from regular GAIL from the same
    directory (`rlf/algos/il/gail.py`).

# Acknowledgement
- The Grid world environment is from [maximecb](https://github.com/maximecb/gym-minigrid)
- The Fetch environment is with some tweaking from [OpenAI](https://github.com/openai/gym/tree/master/gym/envs/robotics/fetch)
- The Ant environment is with some tweaking from [DnC](https://github.com/dibyaghosh/dnc)
