# Mimic Proxy

## Dependencies
Install this repository by running `pip install -e .`. We recommend using a virtual enviornment.

Then install the requirements by running `pip install -r requirements.txt`.

## Data processing

To get started, first edit `constants.py` to point to the directories holding your copies of the MIMIC-III datasets. Then, organize your data with the following structure:
```
mimicdata
|   D_ICD_DIAGNOSES.csv
|   D_ICD_PROCEDURES.csv
|   ICD9_descriptions (already in repo)
└───mimic3/
|   |   NOTEEVENTS.csv
|   |   DIAGNOSES_ICD.csv
|   |   PROCEDURES_ICD.csv
|   |   *_hadm_ids.csv (already in repo)
```

Now, make sure your python path includes the base directory of this repository. Then, in Jupyter Notebook, run all cells (in the menu, click Cell -> Run All) in `notebooks/dataproc_mimic_III.ipynb`. These will take some time, so go for a walk or bake some cookies while you wait. You can speed it up by skipping the "Pre-train word embeddings" sections. 

## Training proxy model

To retrain proxy model, run `python mimic_proxy.py`.

## Evaluating proxy model
There are two evaluations for the proxy model (1) How faithful is it to CAML's model and (2) How does it perform on true labels. 

In order to test (1) How faithful the proxy is, run
```
python scripts/calculate_faithfulness.py \
    /path/to/caml_scores \
    /path/to/test_full.csv \
    /path/to/vocab.csv \
    /path/to/outdir/scores.txt
```

In order to test (2) How the proxy model performs on true labels, run
```
python scripts/evaluate_proxy.py \
    /path/to/test_full.csv
    /path/to/vocab.csv
```