Setup either walker or maze folder
#######################
To install the necessary dependencies, run the following commands:
conda create --name envname python=3.8
conda activate envname
pip install -r requirements.txt
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
cd ..
pip install pyglet==1.5.11


########################
Running experiments
To run the baseline, there are configuration json files to generate the train.py commands.
Reviewers can refer to the open source link provided in the appendix for more details.
To run our proposed method, MBeDED, please refer to the command.txt file.


### Logging
By default, `train.py` generates a folder in the directory specified by the `--log_dir` argument, named according to `--xpid`. This folder contains the main training logs, `logs.csv`, and periodic screenshots of generated levels in the directory `screenshots`. Each screenshot uses the naming convention `update_<number of PPO updates>.png`. 

### Checkpointing
**Latest checkpoint**
The latest model checkpoint is saved as `model.tar`. The model is checkpointed every `--checkpoint_interval` number of updates. When setting `--checkpoint_basis=num_updates` (default), the checkpoint interval corresponds to number of rollout cycles (which includes one rollout for each student and teacher). Otherwise, when `--checkpoint_basis=student_grad_updates`, the checkpoint interval corresponds to the number of PPO updates performed by the student agent only. This latter checkpoint basis allows comparing methods based on number of gradient updates actually performed by the student agent, which can differ from number of rollout cycles.

**Archived checkpoints**
Separate archived model checkpoints can be saved at specific intervals by specifying a positive value for the argument `--archive_interval`. For example, setting `--archive_interval=1250` and `--checkpoint_basis=student_grad_updates` will result in saving model checkpoints named `model_1250.tar`, `model_2500.tar`, and so on. These archived models are saved in addition to `model.tar`, which always stores the latest checkpoint, based on `--checkpoint_interval`.


## Running experiments
We provide baseline's configuration json files to generate the `train.py` commands for the specific experiment settings featured in the main results of previous works. To generate the command to launch 1 run of the experiment described by the configuration file `config.json` in the folder `train_scripts/grid_configs`, simply run the following, and copy and paste the output into your command line. 

To run our proposed method, MBeDED, please refer to the command.txt file.
