# Foundations of Multivariate Distributional RL

## Setup

This project uses [PDM](https://pdm-project.org/latest/) for dependency management. See <https://pdm-project.org/latest/#installation> for installation instructions.

Once PDM has been installed, execute the following from the project root to sync the dependencies:

```bash
pdm venv create
pdm install
```

Before running any code, be sure to activate the virtual environment (from the project root):

```bash
source .venv/bin/activate
```

## Training

The easiest way to run training scripts is with our `justfile`, using the [`just`](https://just.systems/) command runner.

### Figure 2/3
To train agents and gather data for Figure 2,

```bash
just bpd=<int> bayes_beta [sm_cat_projected_td | ewp_td]
```

where `bpd` denotes the number of bins per cumulant dimension. For the EWP algorithm, this will result in EWP representations with `bpd^2` atoms.

To train agents to gather data for Figure 3 (and generally, to use randomized supports for the categorical algorithm), execute

```bash
just dim=<int> num_atoms=<int> bayes_beta_multi [sm_cat_projected_td | ewp_td]
```

where `dim` is the cumulant dimension.

### Figure 4

To generate figure 4, execute

```bash
just bpd=20 termrew [sm_cat_projected_td | ewp_td]
```