# Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs

Code provided to reproduce the main experiments in the paper - Figs. 1, 2, 3 and 4 from the main paper as well as Fig. 7 from Appendix D.

## Installation

The Python version employed is 3.9.7
The figures are presented in .ipynb notebooks.
Additionally, the following dependencies are required:
    - numpy 1.21.2
    - pytorch 1.10.0
    - matplotlib 3.4.3

No GPU required

## Usage

### Figure 1 (Gridworld)

Create datasets with the command:

    python -m experiments.uncertainty_quantification.data.generate_counts

This populates the folder "experiments/uncertainty_quantification/data/datasets" with the relevant offline datasets.

Run the uncertainty quantification analysis with the command:

    python -m experiments.uncertainty_quantification.main

This saves results in the "results/uncertainty_quantification" folder.
Then, Fig. 1 can then be reproduced by running the relevant cells in the "figures.ipynb" notebook


### Figure 2 (Gridworld)

Create datasets with the command:

    python -m experiments.policies.data.generate_counts

This populates the folder "experiments/policies/data/datasets" with the relevant offline datasets.

Train the different policies with the commands:

    python -m experiments.policies.mle
    python -m experiments.policies.nominal
    python -m experiments.policies.second_order
    python -m experiments.policies.grad_stochastic (ours)
    python -m experiments.policies.msbi (for appendix results)

Evaluate the policies' posterior expected value with the command:

    python -m experiments.policies.bayes_value

By default, this does not include MSBI (Appendix results) in the evaluation. To include it, manually modify the eval_args dict in experiments/policies/bayes_value.py (line 12).
This saves results in the "results/policies" folder.
Then, Fig. 2 can then be reproduced by running the relevant cells in the "figures.ipynb" notebook.

### Figures 3 and 4

Run the commands:

    python -m synthetic_mdps.main
    python -m synthetic_mdps.msbi (for appendix results)

This saves results to the synthetic_mpds folder.
Then, Figs. 3 and 4 can be generated by running the relevant cells in the synthetic_results.ipynb notebook.

### Figure 7

Create datasets with the command:

    python -m uadqn_eval.data.generate_counts

This populates the folder "uadqn_eval/data/datasets" with the relevant offline datasets

Train the agent to evaluate the MLE-optimal policy on these datasets with the commands:

    python -m uadqn_eval.train_gridworld

This saves results in the "results/policies" folder.
Then, Fig. 5 can then be reproduced by running the relevant cells in the "figures.ipynb" notebook.
