# Code based on Dungeons & Data

This repository contains code for reproducing experiments concerning the NetHack dataset from our paper "The Role of Forgetting in Fine-Tuning Reinforcement~Learning Models". 

If one wants to skip the technical details about "Dungeons & Data" and learn how to run experiments from our article, please skip to the "Run a local experiments" section.

## Setup


```sh
# Basic setup, ensure you already have Cuda 10.2+ and CUDNN installed.
conda create -n nle python=3.9
conda activate nle

# Install PyTorch and cmake.
conda install pytorch cudatoolkit=10.2 -c pytorch
conda install cmake

# Install moolib
pip install git+ssh://git@github.com/facebookresearch/moolib

# Install NLE .
pip install git+https://github.com/facebookresearch/nle.git@main

# Get this repo.
git clone --recursive git+https://github.com/dungeonsdatasubmission/dungeonsdata-neurips2022.git 
cd dungeonsdata-neurips2022/code

# install render_utils, hackrl 
pip install -r requirements.txt
cd render_utils && pip install -e . && cd ..
pip install -e .

# Test NLE.
python -c 'import gym; import nle; env = gym.make("NetHackScore-v0"); env.reset(); env.render()'



```

## Running the broker

To run an experiment with many peers on different machines, these peers need
to be able to find each other. We use the moolib _broker_ for that purpose.

First, start a moolib broker in a shell on your devfair:

```
python -m moolib.broker
```

It will output something like `Broker listening at 0.0.0.0:4431`.



As an example:

```
export BROKER_IP=$(echo $SSH_CONNECTION | cut -d' ' -f3)  # Should give your machines's IP.
export BROKER_PORT=4431
```

Note that a **single broker is enough** for all your experiments, as long as
the combination of `project` and `group` flags of each experiment are unique.

## Run a local experiments

With this information and a running broker, we can start a local experiment:

```
# Run an experiment locally using default arguments.
python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT
```

By setting `wandb: true` in `hackrl/config.yaml`,
you can check learning curves on Weights and Biases.

### Run training

To run Behavioural Cloning (BC) Experiments:
```
# python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT exp_set=2G exp_point=monk-AA-BC num_actor_cpus=20 total_steps=2_000_000_000 actor_batch_size=256 batch_size=128 ttyrec_batch_size=512 supervised_loss=1 adam_learning_rate=0.001 behavioural_clone=True character='mon-hum-neu-mal' group='monk-AA-BC'
```

To run APPO Experiments: 
```
python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT exp_set=2G num_actor_cpus=20 exp_point=monk-APPO  total_steps=2_000_000_000 character='mon-hum-neu-mal' group='monk-APPO'
```
To run APPO + FT Experiments: 
```
python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT exp_set=2G num_actor_cpus=20 exp_point=monk-APPO  total_steps=2_000_000_000 character='mon-hum-neu-mal' group='monk-APPO-FT' use_checkpoint_actor=True unfreeze_actor_steps=0 model_checkpoint_path=/path/to/checkpoint.tar log_forgetting=True forgetting_dataset=<name_of_the_dataset_in_ttyrec_database> kickstarting_path=/path/to/checkpoint.tar 
```

To run APPO + KS Experiments: 
```
python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT exp_set=2G exp_point=monk-APPO-AA-KS total_steps=2_000_000_000 num_actor_cpus=20 kickstarting_loss=0.5 use_kickstarting=true kickstarting_path=/path/to/checkpoint.tar  character='mon-hum-neu-mal' group='monk-APPO-AA-KS'
use_checkpoint_actor=True unfreeze_actor_steps=0 model_checkpoint_path=/path/to/checkpoint.tar log_forgetting=True forgetting_dataset=<name_of_the_dataset_in_ttyrec_database>
```

To run APPO + BC Experiments:
```
python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT exp_point=monk-APPO-AA-BC num_actor_cpus=20 total_steps=2_000_000_000 batch_size=128 ttyrec_batch_size=256 supervised_loss=0.5 exp_set=2G character='mon-hum-neu-mal' group='monk-APPO-AA-BC' use_checkpoint_actor=True unfreeze_actor_steps=0 model_checkpoint_path=/path/to/checkpoint.tar log_forgetting=True forgetting_dataset=<name_of_the_dataset_in_ttyrec_database> kickstarting_path=/path/to/checkpoint.tar 
```
To run APPO + EWC Experiments:
```
python -m hackrl.experiment connect=$BROKER_IP:$BROKER_PORT exp_set=2G num_actor_cpus=20 exp_point=monk-APPO  total_steps=2_000_000_000 character='mon-hum-neu-mal' group='monk-APPO-EWC'
freeze_from_the_beginning=False use_ewc=True ewc_penalty_scaler=8000 ewc_n_batches=1000 use_checkpoint_actor=True unfreeze_actor_steps=0 model_checkpoint_path=/path/to/checkpoint.tar log_forgetting=True forgetting_dataset=<name_of_the_dataset_in_ttyrec_database> kickstarting_path=/path/to/checkpoint.tar 
```

### Run evaluation
```
python -m hackrl.eval_array connect=$BROKER_IP:$BROKER_PORT num_actor_cpus=20 num_actor_batches=2 rollouts=1024 batch_size=256 checkpoint_step=100_000_000 wandb=True checkpoint_dir=/path/to/checkpoint.tar
```