# Code submission for Bayesian Generational Population-based Training (BG-PBT)

## Dependencies

Accompanied ```requirements.txt``` is a printout of the environment that we used to run the experiments by running 
```conda list -e > requirements.txt```

Particular attention should be paid to Brax -- a package that is still very much under active development. We use 0.10.0; using a version that is different to ours
might lead to significant discrepancies in results. 

We recommend running the following script to install the main dependencies
```
pip install ConfigSpace joblib smac GPy pandas
pip install git+https://github.com/google/brax.git@main
pip install torch==1.9.0+cu111 gpytorch -f https://download.pytorch.org/whl/torch_stable.html
pip install --upgrade jax==0.2.21 jaxlib==0.1.72+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
```

We use Anaconda Python 3.7.

## Scripts to run experiments in the paper

### Main experiments
Full BGPBT (with architectures) -- note that we have different hyperparameters for Humanoid and Hopper -- see Appendix
for details for this & delete as appropriate for the seed.
```
python3 -m test_scripts.run_pbt -v -e ant --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 30_000_000
python3 -m test_scripts.run_pbt -v -e halfcheetah --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 30_000_000
python3 -m test_scripts.run_pbt -v -e humanoid --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 40_000_000 -de 60_000_000 -md 1
python3 -m test_scripts.run_pbt -v -e hopper --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 40_000_000 -de 60_000_000 -md 1
python3 -m test_scripts.run_pbt -v -e fetch --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both  -td 30_000_000
python3 -m test_scripts.run_pbt -v -e reacher --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both -td 30_000_000
python3 -m test_scripts.run_pbt -v -e ur5e --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm both -td 30_000_000
```
Here we briefly explain the meaning of the most notable flags (full descriptions may be found in ```./test_scripts/run_pbt.py```:

````-e````: environment {ant/halfcheetah/humanoid/hopper/fetch/reacher/ur5e}

```--pop_size```: population size: we use 8 for all experiments, although in the appendix we show the result with 24 agents

```-mp --max_parallel```: maximum parallel agents to **actually** run at the same time up to ''pop_size''. 
This needs to be adjusted based on  the VRAM of your GPU. On a single Nvidia GeForce 3090 with 24 GB of VRAM, 
```-mp=4``` is safe for all experiments except for Humanoid (where 2 is used). Note that a smaller ```-mp```
will lead to slower wall-clock speed, but should not affect the results as the algorithm will
still wait for the entire population to finish before running the next iteration (synchronous).

```-qf --quantile_fraction```: the percentage of agents to be replaced at each iteration.

```-ni```: number of initialising agents

```-o --optimizer```: bgpbt/pbt/pb2


PBT/PB2 baselines (delete where appropriate in the commands below).
The PBT/PB2 implementations are largely lifted (with minor adaptations) from the repository provided by the original authors:
https://github.com/jparkerholder/procgen_autorl

```
python3 -m test_scripts.run_pbt -v -e ant --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e halfcheetah --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e humanoid --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e hopper --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e fetch --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e reacher --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e ur5e --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o {pbt/pb2} --seed {0,1,2,3,100,200,300} -sm hpo
```

The configurations found by RS and BO (using SMAC3) can be found in ```./smac_baselines.py```.
```{env_name}_{nas/no_nas}_{same_resource/full}```
- ```nas/no_nas```: whether to search in joint hyperparameter/architecture space, or on the hyperparameter space only
- ```same_resource/full```: ```same_resource``` is the best config found after 8 random search steps, ```full``` is the one found by running sequential SMAC for full 50 steps.

### Ablation studies

BGPBT without distillation and architecture search
```
python3 -m test_scripts.run_pbt -v -e ant --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e halfcheetah --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e humanoid --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e hopper --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 5_000_000 -te 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e fetch --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo 
python3 -m test_scripts.run_pbt -v -e reacher --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
python3 -m test_scripts.run_pbt -v -e ur5e --pop_size 8 -mp 4 -qf 0.125 -ni 24 -exist overwrite -tr 1_000_000 -mt 150_000_000 -o bgpbt --seed {0,1,2,3,100,200,300} -sm hpo
```

### Intermediate results and plotting scripts
See ```./data/plot.ipynb``` on how to generate the figures in the paper and the ```./data``` contains the raw csv generated from running the scripts.

### Checkpoints to visualize the policies
We include some checkpoints in ```./checkpoints``` and the associated notebook to quickly render policies found by BGPBT.