# Improving Intrinsic Exploration with Language Abstractions

This is the codebase accompanying the NeurIPS submission, Improving Intrinsic
Exploration with Language Abstractions.

Note to reviewers:
1. The codebase will be further cleaned-up in time for camera-ready submission and released on GitHub.
2. There are institutional affiliations in files in the `minihack` `subgoals/minigrid`, and `subgoals/babyai` folders, but this is *not* revealing author identity: these are just clones of existing open-source repositories (with slight modifications) available at: https://github.com/facebookresearch/minihack https://github.com/maximecb/gym-minigrid https://github.com/mila-iqia/babyai/ . Our original code is in `langexplore/` and is fully anonymized.

# Setup

Tested with python 3.9. It's recommend to create a `conda` env:

```
conda create -n langexplore python=3.9
conda activate langexplore
pip install -r requirements.txt
```

Note you may have to follow https://pytorch.org/ setup instructions for
installation on your own machine.

**To run NovelD and L-NovelD on MiniHack**, which uses a separate codebase, you
will additionally need to install `polybeast`. Follow the instructions in
`minihack/README.md` (same as the README here:
https://github.com/facebookresearch/minihack) for `Baseline Agents`.

# Codebase

- `langexplore`: is the codebase which contains implementations for baseline
    models, AMIGo, L-AMIGo, NovelD, and L-NovelD on MiniGrid, and AMIGo,
    L-AMIGo on MiniHack. Note that in this main codebase, NovelD on MiniHack is
    not implemented (see next folder)
    - `train.py` is the main training code, which defines learner and actor
        threads, as well as AMIGo teacher training functions (learning
        generator grounder and policy). Also contains code for plotting
        language goals and logging to wandb.
    - `losses.py` contains implementations of loss functions.
    - `optimizers.py` contains functions for creating optimizers.
    - `buffers.py` contains slightly modified implementations of shared-memory
        buffers for actor-learner communication, based off of TorchBeast.
    - `conf` contains experiment config files for Hydra.
    - `models` contains implementations of both MiniGrid (i.e. BabyAI) and
        MiniHack AMIGo teachers, L-AMIGo teachers, students, and RND models.
    - `torchbeast` contains files from TorchBeast https://github.com/facebookresearch/torchbeast
    - `utils` contains misc utilities.
- `minihack`: this is a modified clone of https://github.com/facebookresearch/minihack. Contains code for NovelD and L-NovelD for MiniHack. This package is installed when installing from `requirements.txt`.
    - Within this codebase we have modified `minihack.agent.polybeast.intrinsic` for NovelD and L-NovelD.
- `subgoals`: modified clones of
    https://github.com/maximecb/gym-minigrid/ and
    https://github.com/mila-iqia/babyai/ where environments are modified to
    return completed language goals. Packages here are installed when
    installing from `requirements.txt`.
- `data` contains misc data files, mostly BERT vocab.

This `langexplore` codebase should exist in the home directory for some
absolute paths to work. If you don't want to store it in `$HOME` you will need
to replace all instances of `oc.env:HOME` in the config files with your desired
directory:

- `msg.vocab_file` key in `langexplore/conf/minihack/config.yaml`
- `savedir` key in `langexplore/conf/config.yaml`
- `vocab_file` key in `minihack/agent/polybeast/conf/config.yaml`

# Running Experiments

## Everything except NovelD and L-NovelD on MiniHack

Use the scripts `scripts/run_minigrid.sh` and `scripts/run_minihack.sh`, which
contain preconfigured arguments and hyperparameters for all experiments and
ablations in the paper.  Experiments use Hydra to manage configuration.

Visualizing results requires `wandb`: configure project name with the `project`
key in `langexplore/conf/config.yaml`.

As an example of how to run, this runs L-AMIGo on the key3 environment:

```
OMP_NUM_THREADS=1 python -m langexplore.train +=experiment=key3 group=experiment-name  \
    language_goals=online_grounding generator=true noveld=false
```

`OMP_NUM_THREADS=1` is essential to prevent CPU ops from hanging.

where `language_goals` can be `null` (AMIGo), `online_grounding` (L-AMIGo), or `online_naive` (L-AMIGo without grounding network). To run NovelD, set `generator=false` and `noveld=true`. To run IMPALA baseline, set `generator=false`. To run L-NovelD, set `generator=false`, `noveld=true`, and `separate_message_novelty=true`. To run naive (fixed) message reward baseline, set `naive_message_reward=0.1`.

The environment is selected with the `+experiment=` flag, each of which
corresponds to a YAML file in `langexplore/conf/experiment/`. See that folder
for the list of available experiments.

other hyperparameters are explained in `langexplore/conf/config.yaml`.
MiniHack-specific hyperparameters are documented in
`langexplore/conf/minihack/config.yaml`

## NovelD and L-NovelD on MiniHack

This uses the agent in the `minihack` package located in this codebase. As
mentioned in setup, you will first need to install polybeast (follow the
README). Then use experiments in `minihack/scripts/run.sh` which contain
documented commands that run NovelD, L-NovelD, and ablations on MiniHack. Most
parameters are the same as those above, with slight wording differences, but
the differences in hyperparameters between the different commands should help
you infer what they do.

## Slurm

To submit to slurm, append the following to any of the commands:
```
+launcher=slurm hydra.launcher.timeout_min=$TIME &; sleep 2
```
the point of `sleep 2` is to give successive slurm submissions some buffer time
so that they do not write experiment outputs to the same (timestamped) files.

## Interpreting results

The most important wandb plot is `mean_episode_return`, which plots extrinsic
reward. Other logged metrics should be self-explanatory. Supplementary
visualizations of L-AMIGo and L-NovelD are uploaded as plotly plots in the
wandb dashboard.  `all_templates_norm` plots proportion of goals proposed by
L-AMIGo teacher. `achieved_templates_norm` plots proportion of goals *achieved*
by L-AMIGo student.

To visualize NovelD throughout training, set
`plot_novelty=true` in hydra for `langexplore` dir or `noveld.plot=true` in
`minihack/agent/polybeast/conf/config.yaml` for MiniHack experiments. Then
`noveld_plot` plots message novelty through time.

# Questions?

For any questions, please reach out to authors in the rebuttal phase!
