# Benchmark Tasks - Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

This folder contains the code changed for running the six benchmark tasks "Hell-Heaven-3", "Shopping-5", "Car-Flag", "
Cleaner", "7x7-Memory-Four-Room", and "9x9-Memory-Four-Room", and visualizing the corresponding learning curves (cf.
Figure 1). It also contains the underlying data for the results presented in the article's main body.

In this work, we extend the implementation of the partially observable environments and A2C variants presented by
Baisero and Amato (2022) to the informed setting.

## Project setup

We recommend to use the "aac-310" conda environment you created earlier to run the code.

First, you need to install the `asym-rlpo` base repo, `gym-gridverse` repo to run the gridverse experiments,
the `gym-pomdps` and `rl-parsers` repos to run the classic POMDPs experiments, and some `one-to-one` repo that provide
supporting functionalities in your workspace. First, activate the conda environment you created earlier.

### Retrieving asym-rlpo

To retrieve asym-rlpo, you need to clone the `asym-rlpo` [github repo](https://github.com/abaisero/asym-rlpo.git) @
commit "fef2740b4ee8dfdfbb57e3b17415a2a88f5310d5".
This can be done as follows:

```bash
git clone git@github.com:abaisero/asym-rlpo.git
cd asym-rlpo
git reset --hard fef2740
```

### Other dependencies

It is advised to install the other repositories in the following order:

- https://github.com/abaisero/rl-parsers @ commit "da1b48816a170cf8688da5556e4bb4dd51121b4f"
- https://github.com/abaisero/one-to-one @ commit "0ae94b5c8ce928388cb312eb6d0adfae0629c29b"
- https://github.com/abaisero/gym-pomdps @ commit "81ffa3781790b2127164a2d6f726ef9ced0ea956"
- https://github.com/abaisero/gym-gridverse @ commit "f036a612f1300122cf70bbac42f2fb37d3f8990b"

Please install these in edit mode, to accomodate the ability to more
easily pull changes from the remote repositories. For example, run the
following commands to install the `rl-parsers` code,

```bash
git clone git@github.com:abaisero/rl-parsers.git
cd rl-parsers
git reset --hard da1b488
python -m pip install -e .
```

Repeat the above for all four prerequisite repositories. Finally, install the packages in the `requirements.txt`
in `asym-rlpo` into the conda environment:

```bash
pip install -r requirements.txt
```

Afterwards, please (re)place all the files listed in our code archive within the `benchmark-task` folder in the
corresponding folders in your workspace.

## Running experiments

Our experiment script is `asym-rlpo/run_experiments.py`. Using the command

```shell
python asym-rlpo/run_experiments.py
```

you will run the default settings: run informed asymmetric A2C together with all three baselines on
the `POMDP-heavenhell_3-episodic-v0` environment for `6_250_000` steps each.

Here, we provide a mapping from env name as used in the article's main body to env id used in the code:

- HeavenHell-3: `POMDP-heavenhell_3-episodic-v0`
- Shopping-5: `POMDP-shopping_5-episodic-v1`
- Car Flag: `extra-car-flag-v0`
- Cleaner: `extra-cleaner-v0`
- 7x7-memory-4-rooms: `../gym-gridverse/yaml/gv_memory_four_rooms.7x7.yaml`
- 9x9-memory-4-rooms: `../gym-gridverse/yaml/gv_memory_four_rooms.9x9.yaml`

If you want to change the environment, use the `--envs` flag.
You can add multiple `envs` as long as they share the same hyperparameters.

To visualize the learning curves (cf. Figure 1), run `asym-rlpo/visualize_learning_curves.py`.

### Important remark

As we use Weights & Biases to log our results, you need to login to a wandb account within your conda environment and
add your credentials/settings in `asym-rlpo/main_a2c.py` prior to running the experiments:

```shell
parser.add_argument('--wandb-entity', default='<YOUR_USERNAME>')
parser.add_argument('--wandb-project', default='<YOUR_PROJECT_NAME>')
parser.add_argument('--wandb-group', default='<YOUR_GROUP_NAME>')
```

Results will be stored by default locally ("dryrun") and need to be synchronized manually. However, you can change
settings in `asym-rlpo/run_experiments.py` to automatically sync logs with Weights & Biases cloud:

```shell
os.environ["WANDB_MODE"] = "online"
```


