Dear reviewers,

Our implementation consists of two code bases, anonymized variations of are included here. 
First an implementation of Epistemic MCTS in Jax, building on top of Deep Mind's mctx repository https://github.com/google-deepmind/mctx.
Second, an implementation of an AZ agent in Jax, which was used to generate the results for the SUBLEQ experiments, as well the SUBLEQ environment, building on the PGX repository https://github.com/sotetsuk/pgx.

Installation instructions with conda are provided below:

Unzip emctx, e-alphazero and the yaml file.
Execute the following:
```sh
conda env create -f path_to_yaml_dir/environment.yml
conda activate e_az
```
Then, navigate into the emctx dir and install it as a package locally with:
```sh
pip install .
```

To run the E-AZ agent on subleq, navigate to the ```/e-alphazero``` dir and execute:
```sh
python src/main.py directed_exploration=True exploration_beta=1.0 subleq_task=IDENTITY
```