## Code: Out-of-Distribution State Detection in Reinforcement Learning with Masksembles

The following code reproduces the main results of the paper Out-of-Distribution State Detection in Reinforcement Learning with Masksembles.

For Training, you can run a single script which will train all models across environments. By default, three seeds are used.

## Prequisites

Besides the libraries given in the requirements, we have utilized adapted versions of StableBaselines3 and masksembles.
Both are part of this source code.
You may install them locally via ``

## Training

We used a customized and extended version of StableBaselines3 Zoo as a basis for the repositry.

1. Install all dependencies. Make sure to also have the local copies of StableBaselines3 and masksembles installed.
   You may also need to install dependecies for Atari and Mujoco (see the respective documentations).

2. You can run the `ultra_script.py` to run the full set of trainings. You can restrict e.g. the environments or seeds inside of the script. Trained models are then saved in the `neurips_benchmarks` directory. The evaluation scripts also reference this directory.

Note that training script can obviously take days to weeks to complete.


## Evaluation

1. Install all dependencies via pip/conda (requirements.txt)

2. Run the evaluation notebooks
    - Three separate notebooks for each Mujoco environment  
        - Continuous_Evaluation_Ant.ipynb for Ant-v3
        - Continuous_Evaluation_Walker.ipynb for Walker2d-v3
        - Continuous_Evaluation_Cheetah.ipynb for HalfCheetah-v3
    - For Atari, there is a single evaluation script which will generate all scores for all environments.

3. Run the ROC curve notebooks
    - A single Jupyter notebook can be used to generate ROC curves, plots for uncertainty measures etc.
    - The script can be changed at the top. Just fill in the environment you want to generate ROC curves.
    As a small quirck, the ROC notebook has to be run multiple times to incoroporate all seeds into the plots (a simple restart is sufficient).
    At the time of development this was the best solution to ensure consistency with seeding.