# Instructions to run the experiments in MIMIC, FOREST, and BIO

## Step 0: Download the datasets:
For MIMIC, follow the [instructions](https://physionet.org/content/mimic-cxr/2.0.0/) at the bottom of the page to gain access to the dataset.

For FOREST, download covtype.data from [here](https://archive-beta.ics.uci.edu/dataset/31/covertype).

For BIO, download the training set (data_set_ALL_AML_train.csv), the test set (data_set_ALL_AML_independent.csv), and the set of AML/ALL labels (actual.csv) from [here](https://www.kaggle.com/datasets/crawford/gene-expression).

## MIMIC:

To run the experiments on model logits, run
```
python mimic.py \
--data_dir /path/to/model_logits.pkl \
--n_samples 100 \
--eps_list '0.01,0.05,0.1,0.2' \
--iters 100 \
--n_steps 10 \
--n_rep 500 \
--biased_mmd False
```
Where /path/to/model_logits.pkl is a pickle file of the form [data_from_x, data_from_y] containing numpy arrays of the logits between groups.

This will generate the bounds for all methods with simulated epsilon corruption for all values in eps_list. eps_list should be a string of comma seperated values of epsilon contamination values.

## FOREST:
To run the experiments on FOREST covertype data, run
```
python forest.py \
--data_dir /path/to/covtype.data \
--n_samples 100 \
--eps_list '0.01,0.05,0.1,0.2' \
--iters 100 \
--n_steps 10 \
--n_rep 500 \
--biased_mmd False
```

## BIO:
To run the experiments on BIO leukemia gene expression, run
```
python bio.py \
--train_dir "/path/to/cancer_data/data_set_ALL_AML_train.csv" \
--test_dir "/path/to/cancer_data/data_set_ALL_AML_independent.csv" \
--label_dir "/path/to/cancer_data/actual.csv" \
--eps_list '0.01,0.05,0.1,0.15' \
--iters 100 \
--n_steps 10 \
--n_rep 500 \
--biased_mmd False
```

