# TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

This is the official codebase for paper TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design.

This codebase is built on top of the [Dual Curriculum Design (DCD) codebase](https://github.com/facebookresearch/dcd).

## Setup

To install the necessary dependencies, run the following commands:

```
conda create --name dcd python=3.8
conda activate dcd
pip install -r requirements.txt
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
cd ..
pip install pyglet==1.5.11
```

Note: If you encounter this error while running the code,

```
AttributeError: module 'numpy' has no attribute 'bool'.
```

It means that `np.bool` has been removed (it was deprecated in NumPy 1.20). To fix this, open the indicated line in the library and replace np.bool with `np.bool\_`

The change is behavior-preserving and safe. For full details on the deprecation, see:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

## A quick overview of [`train.py`](https://github.com/facebookresearch/dcd/blob/master/train.py)

### Choosing a UED algorithm

The exact UED algorithm is specified by a combination of values for `--use_plr`, `no_exploratory_grad_updates`, `--ued_editor`, and `--use_traced`:
| Method | `use_plr`| `no_exploratory_grad_updates` | `ued_editor`| `use_traced`|
| ------------- |:-------------|:-------------|:-------------|:-------------|
| DR | `false` | `false` | `false` | `false` |
| PLR | `true` | `false` | `false` | `false` |
| PLR<sup>⊥</sup> | `true` | `true` | `false` | `false` |
| ACCEL | `true` | `true` | `true` | `false` |
| TRACED | `true` | `true` | `true` | `true` |

Full details for the command-line arguments related to PLR and ACCEL can be found in [`arguments.py`](https://github.com/facebookresearch/dcd/blob/master/arguments.py). We provide simple configuration JSON files for generating the `train.py` commands for the best hyperparameters found in experimental settings from prior works.

### Logging

By default, `train.py` generates a folder in the directory specified by the `--log_dir` argument, named according to `--xpid`. This folder contains the main training logs, `logs.csv`, and periodic screenshots of generated levels in the directory `screenshots`. Each screenshot uses the naming convention `update_<number of PPO updates>.png`. When ACCEL is turned on, the screenshot naming convention also includes information about whether the level was replayed via PLR and the mutation generation number for the level, i.e. how many mutation cycles led to this level.

### Checkpointing

**Latest checkpoint**
The latest model checkpoint is saved as `model.tar`. The model is checkpointed every `--checkpoint_interval` number of updates. When setting `--checkpoint_basis=num_updates` (default), the checkpoint interval corresponds to number of rollout cycles (which includes one rollout for each student and teacher). Otherwise, when `--checkpoint_basis=student_grad_updates`, the checkpoint interval corresponds to the number of PPO updates performed by the student agent only. This latter checkpoint basis allows comparing methods based on number of gradient updates actually performed by the student agent, which can differ from number of rollout cycles, as methods based on Robust PLR, like ACCEL, do not perform student gradient updates every rollout cycle.

**Archived checkpoints**
Separate archived model checkpoints can be saved at specific intervals by specifying a positive value for the argument `--archive_interval`. For example, setting `--archive_interval=1250` and `--checkpoint_basis=student_grad_updates` will result in saving model checkpoints named `model_1250.tar`, `model_2500.tar`, and so on. These archived models are saved in addition to `model.tar`, which always stores the latest checkpoint, based on `--checkpoint_interval`.

## Evaluating agents with [`eval.py`](https://github.com/facebookresearch/dcd/blob/master/eval.py)

### Evaluating a single model

The following command evaluates a `<model>.tar` in an experiment results directory, `<xpid>`, in a base log output directory `<log_dir>` for `<num_episodes>` episodes in each of the environments named `<env_name1>`, `<env_name1>`, and `<env_name1>`, and outputs the results as a .csv in `<result_dir>`.

```shell
python -m eval \
--base_path <log_dir> \
--xpid <xpid> \
--model_tar <model>
--env_names <env_name1>,<env_name2>,<env_name3> \
--num_episodes <num_episodes> \
--result_path <result_dir>
```

### Evaluating multiple models

Similarly, the following command evaluates all models named `<model>.tar` in experiment results directories matching the prefix `<xpid_prefix>`. This prefix argument is useful for evaluating models from a set of training runs with the same hyperparameter settings. The resulting .csv will contain a column for each model matched and evaluated this way.

```shell
python -m eval \
--base_path <log_dir> \
--prefix <xpid_prefix> \
--model_tar <model> \
--env_names <env_name1>,<env_name2>,<env_name3> \
--num_episodes <num_episodes> \
--accumulator mean \
--result_path <result_dir>
```

### Evaluating on zero-shot benchmarks

Replacing the `--env_names=...` argument with the `--benchmark=<benchmark>` argument will perform evaluation over a set of benchmark test environments for the domain specified by `<benchmark>`. The various zero-shot benchmarks are described below:
| `benchmark` | Description |
| ------------- |:-------------|
| `maze` | Human-designed mazes, including singleton and procedurally-generated designs. |
| `bipedal` | `BipedalWalker-v3`, `BipedalWalkerHardcore-v3`, and isolated challenges for stairs, stumps, pit gaps, and ground roughness. |

## Running experiments

We provide configuration json files to generate the `train.py` commands for the specific experiment settings featured in the main results of previous works. To generate the command to launch 1 run of the experiment described by the configuration file `config.json` in the folder `train_scripts/grid_configs`, simply run the following, and copy and paste the output into your command line.

```shell
python train_scripts/make_cmd.py --json config --num_trials 1
```

Alternatively, you can run the following to copy the command directly to your clipboard:

```shell
python train_scripts/make_cmd.py --json config --num_trials 1 | pbcopy
```

The JSON files for training methods using the best hyperparameters settings in each environment are detailed below.

## Environments

### 🧭 MiniGrid Mazes

The [MiniGrid-based mazes](https://github.com/facebookresearch/dcd/tree/master/envs/multigrid) from [Dennis et al, 2020](https://arxiv.org/abs/2012.02096) and [Jiang et al, 2021](https://arxiv.org/abs/2110.02439) require agents to perform partially-observable navigation. Various human-designed singleton and procedurally-generated mazes allow testing of zero-shot transfer performance to out-of-distribution configurations.

#### Experiments from [Jiang et al, 2021](https://arxiv.org/abs/2110.02439)

| Method          | json config                                 |
| --------------- | :------------------------------------------ |
| PLR<sup>⊥</sup> | `minigrid/25_blocks/mg_25b_robust_plr.json` |
| PLR             | `minigrid/25_blocks/mg_25b_plr.json`        |
| DR              | `minigrid/25_blocks/mg_25b_dr.json`         |

#### Experiments from [Parker-Holder et al, 2022](https://accelagent.github.io/)

| Method                                 | json config                                              |
| -------------------------------------- | :------------------------------------------------------- |
| TRACED                                 | `minigrid/60_blocks_uniform/mg_60b_uni_traced.json`      |
| ACCEL (from empty)                     | `minigrid/60_blocks_uniform/mg_60b_uni_accel_empty.json` |
| PLR<sup>⊥</sup> (Uniform(0-60) blocks) | `minigrid/mg_60b_uni_robust_plr.json`                    |
| DR (Uniform(0-60) blocks)              | `minigrid/mg_60b_uni_dr.json`                            |

### 🦿🦿 BipedalWalker

The [BipedalWalker environment](https://github.com/facebookresearch/dcd/tree/master/envs/bipedalwalker) requires continuous control of a 2D bipedal robot over challenging terrain with various obstacles, using a propriocetive observation. The zero-shot transfer configurations, used in [Parker-Holder et al, 2022](https://accelagent.github.io/), include `BipedalWalkerHardcore`, environments featuring each challenge (i.e. ground roughness, stump, pit gap, and stairs) in isolation, as well as extremely challenging configurations discovered by POET in [Wang et al, 2019](https://arxiv.org/abs/1901.01753).

| Method          | json config                       |
| --------------- | :-------------------------------- |
| TRACED          | `bipedal/bipedal_traced.json`     |
| ACCEL           | `bipedal/bipedal_accel.json`      |
| PLR<sup>⊥</sup> | `bipedal/bipedal_robust_plr.json` |
| DR              | `bipedal/bipedal_dr.json`         |

### Current environment support

| Method          | 🧭 MiniGrid mazes | 🦿🦿 BipedalWalker |
| --------------- | :---------------- | :----------------- |
| TRACED          | ✅                | ✅                 |
| ACCEL           | ✅                | ✅                 |
| PLR<sup>⊥</sup> | ✅                | ✅                 |
| PLR             | ✅                | ✅                 |
| DR              | ✅                | ✅                 |

## License

This project is licensed under CC BY-NC 4.0.  
Modules derived from PyTorch (e.g., the RNN code) are licensed under BSD-3-Clause.