# Uncertainty Quantification in Regression Using Proper Scoring Rules


## Installation

```
conda env create -f environment.yml
```

Important are python=3.12, pytorch=2.4 and torchvision=0.19 for reproducability.

## Setting up the datasets

For the toy tasks dots and arrow, execute the notebooks [notebooks/check_dots.ipynb](./notebooks/check_dots.ipynb) and [notebooks/check_arrows.ipynb](./notebooks/check_arrows.ipynb).
They both contain calls to the ```generate_and_save_data``` function of the specific task, which creates the datasets necessary for training and testing.
For cityscapes, follow the instructions in [datasets/README.md](datasets/README.md) first and then execute [datasets/generate_cityscapes.ipynb](datasets/generate_cityscapes.ipynb) to preprocess the dataset for training.

## Training and Evaluation

To train models, call [train.py](train.py).
Example configs can be seen in [run_experiment.sh](run_experiment.sh).

After these runs are finished, one needs to obtain probs on the test dataset for individual ensemble models and save the test targets for convinience.
This is done by calling [evaluate.py](evaluate.py).
All configs can be seen in [run_evaluation.sh](run_evaluation.sh).

To play around with the difference between the usual parameterization of the Gaussian likelihood and the natural parametrization, there is the toy sine example in [toy_regression.ipynb](toy_regression.ipynb).

## Analysis

Currently, there are three types of analysis.

First, testing the uncertainty measures on synthetically drawn means and variances in [test_uncertainty_measures.ipynb](test_uncertainty_measures.ipynb).

Second, evaluating on the synthetic dots gap tail task in [analyze_dots_gap_tail.ipynb](analyze_dots_gap_tail.ipynb).

Finally, selective prediction on the individual datasets in [analyze_selective_prediction.ipynb](analyze_selective_prediction.ipynb).

The usual setup in this file is, that the first cell is the necessary imports and in the second cell, all settings for the experiments can be configured.


## Roadmap

Ideas for experiments

### Datasets

* 1D
    * Synthetic regression task - just for validating losses!
    * Folktables (ACSIncome & ACSTravelTime)
* 2D
    * Synthetic dots
    * Synthetic arrows
    * Cityscapes -> extract 256x256 from 2048x1024 images (8x4) and center crop 224x224; available are 3750 raw images

### Uncertainty Methods

* Deep Ensembles
* Anti-Regularized Deep Ensembles
* SG-MCMC
* MCD
* HMC?

### Tasks

* Toy task (Dots with gap and tail) to debug behaviour of measures
* Selective prediction
* Disentanglement of measures
* OOD detection / Perturbation? (could by synthetic, similar to CIFAR10-C)
* Active learning?? (resource intensiv

