# Games for AI Control

This repo contains all the code needed to generate the results and plots from our paper _Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols_.

## Setup

Initialise a [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) and inside it, run `conda install --file {linux|windows}_env.yml` (depending on your distribution) in order to install all the required packages.

## Base code

All the underlying code needed to run the results-generating notebooks can be found in the `/ai_control_games/` directory. In the directory itself are general tools and classes that can be used to model any AI-Control Game. The `/trusted_monitor/` sub-directory contains code specific to the Trusted Monitor game, including a further sub-directory devoted to our optimization code.

## Final Figure Notebooks

Should you want to replicate our results, we provide four Jupyter notebooks (found in `/final_figure_notebooks/`) that generate the various plots from the paper and also act as examples for how to use the codebase:

1. `figure2_frontier` - the main results from the paper comparing proposed protocols to the approximate Pareto frontier.

2. `figure3_trends` - how different modelling assumptions affect the safety and usefulness measures for different protocols.

3. `figure1b_distribution` - the trusted monitor distributions for both the original and weak settings.

4. `appendix_minimax_maximin` - the minimax and maximin values from the game set out in Appendix E.
