# DeepVent

<!-- TABLE OF CONTENTS -->
## Table of Contents

* [About the Project](#about-the-project)
* [Installation](#installation)
* [Processing Data](#processing-data)
* [Training policies](#training-policies)
* [Evaluating policies](#evaluation)

To reproduce the results, please follow the guidelines starting from the "Installation" section.

If you want to do any kind of data processing/modification to the data processing then read the [Processing Data](#processing-data) section. We have provided preprocessed data within the `Data` folder. The data folder is provided here:

https://drive.google.com/file/d/1HM4xEjqVIVUekJH-BDrrg90FE7ym98W8/view?usp=sharing 

If you want to use the already preprocessed data but train your own policies, skip to the [Training policies](#training-policies) section where we provide instructions on how to train policies. We have also provided pre-trained policies in the `d3rlpy/FINAL_POLICIES` folder. The d3rlpy folder is provided here: 

https://drive.google.com/file/d/1024JDMAY_9kAXFmICuP9E6NK_RDdn5Po/view?usp=sharing

If you want to use the pre-trained policies to generate the plots in the DeepVent paper, skip to the [Evaluating policies](#evaluation) section. 

Note: M1 chip is still not supported in many packages so there might be compatibility issues if you are running using M1 chip.

<!-- ABOUT THE PROJECT -->
## About the Project 
Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Nonetheless, the optimal treatment regime is often unknown, leading to sub-optimal care and increased risks of complications. This work aims to develop a decision support tool to personalize mechanical ventilation. We present DeepVent, an off-policy deep reinforcement learning model that determines the best ventilator settings throughout a patient's stay. We evaluate our model using Fitted Q Evaluation, and show that it is predicted to outperform physicians. Moreover, we address the challenge of policy value overestimation in out-of-distribution settings using Conservative Q-Learning and show that it leads to safe recommendations for patients. We also design an intermediate reward based on the Apache II score to further improve our model's performance.

<!-- INSTALLATION -->
## Installation
1. Go to the parent directory 
```
cd DeepVent
```
2. Create and activate virtual environment 

Linux:
```sh
python -m venv env
source env/bin/activate
```
Windows (run this from the command prompt NOT powershell):
```
python -m venv env
.\env\Scripts\activate.bat
```
3. install the required libraries
```
pip install -r requirements.txt 
```
4. install the root package (Run this from the ROOT directory of the repository, i.e. the directory which contains "data_preprocessing", "evaluation", etc.)
```
pip install -e .
```
5. install pytorch with CUDA capabilities

(Only do this if you want to train your own policies, if you are using the provided pre-trained policies then you can skip this step, if you do not have a CUDA-compatible GPU then you cannot train your own policies)
Go to https://pytorch.org/get-started/locally/ and follow the instructions to install PyTorch with CUDA capabilities (using **pip**) on your OS and with your CUDA version.
<!-- PREPROCESSING DATA -->
## Processing Data
IMPORTANT: If you are using the policies provided to reproduce the graphs from the paper, go to `data_preprocessing/run.py` and comment out the line `import train_test_split` before running `run.py`. Instead, copy the `indices` folder from the `data` folder provided into your `data` folder. This is to use the same train-test splits as were used to train the provided policies.
Here is what all the data processing folders do: 
1. Data Extraction
2. Data Imputation
3. Compute Trajectories
4. Modify elements
5. Split the data and create OOD

<!-- The following folders are extra steps which can be taken to potentially improve the performance of the policy. They do not replace the files containing the raw states, actions and rewards but simply create extra files where the data has been processed even more.
* discretize_actions: Turns 3-tuple of actions into 1 single number for each action
* remove_intermediate reward: Removes intermediate reward -->

To run the data preprocessing: 
1. Obtain the raw data following the instructions in data_preprocessing/data_extraction folder. You will need to insert your path in the scripts. However, data extraction from MIMIC-III involves hours of processing. Therefore, we have provided the raw data retrieved from MIMIC-III under december_table_projectx.csv in the directory data. 
2. within the parent directory, run data preprocessing 
```
python3 data_preprocessing/run.py
```

Note: A more detailed description for each section of data preprocessing is provided in the data preprocessing folder. 


<!-- TRAINING POLICIES -->
## Training policies
1. The preprocessed data is provided under the "data" folder. Alternatively, the dataset can be extracted from the MIMIC database following the data_extraction folder.
2. To find the optimal hyperparameters, grid search can be conducted using:  
```
python3 training/find_cql.py 
python3 training/find_dqn.py
```
3. Train the policy. Edit the values for the other hyperparameters such as LEARNING_RATE, N_EPOCHS, etc. within the script. The given values are the optimal values that was found in this given problem. The path to the policy weights for each epoch will be output in the console. In the same folder as the policy weights you can find the csv files for all the metrics for the policy at each epoch. 
```
python3 training/train_eval_loop.py
```
Note: Each run was done with a different train-test split and took around 14 hours to complete. 

Running the above scripts will generate ouputs in `d3rlpy_logs` folder. 

4. Then run `python3 training/get_all_final_policies.py` to get all policies in the correct format for evaluation (modifying `run_num` and `model_num` and `fqe_model_num` to match the parameters you set in step 3 in `training/train_eval_loop.py`).

<!-- EVALUATING POLICIES -->
## Evaluation

Evaluation directory contains all code necessary to generate graphs and results from DeepVent paper. Requires having 5 runs of both CQL without intermediate reward (DeepVent-), CQL with intermediate reward (DeepVent) and DDQN in the final policies directory defined in `constants.py` 

To simplify things for you, we have provided the trained policies for this problem. 
These policies are within `d3rlpy_logs` folder. 

If you encounter any bugs while running any files in the `evaluation` directory, try running `train_test_split.py` in the `data_preprocessing` folder

To run evalutaion script, simply do:
```
cd evaluation
python3 compare_policies.py
python3 percent_each_setting_close_to_physician.py
python3 grouped_action_bar_plot.py
python3 make_u_curves.py
python3 compare_ood_id.py
```

The corresponding files to run 



figure in the DeepVent paper are:

- Table 1: `compare_policies.py`

<div align="center">
  
| PHYSICIAN | DEEPVENT-  | DEEPVENT  |
| :-----: | :-: | :-: |
| 0.502 | 0.762 | 0.797 |
  
</div>


- Figure 1: `percent_each_setting_close_to_physician.py`

<p align="center">
    <img src="evaluation/original_graphs/similarity_physicians5runs.png" alt="similarity_physicians5runs" width="350" height="300">
  </a>
</p>
<p align="center">
  Figure 1. % of states for which the algorithm's recommendation is within one bin of the physician's recommendation. As compared to DDQN, DeepVent suggests actions more similar to the physicians.
</p>

- Figure 2: `grouped_action_bar_plot.py`
<p align="center">
    <img src="evaluation/original_graphs/gabpFinal.png" alt="gabpFinal" width="600" height="450">
  </a>
</p>
<p align="center">
  Figure 2. Distribution of actions across ventilator settings. Unlike DDQN, DeepVent makes recommendations in safe and clinically relevant range for each setting.
</p>

- Figure 3: `make_u_curves.py`
<p align="center">
    <img src="evaluation/original_graphs/ucurdeepvent.png" alt="ucurdeepvent" width="375" height="300">
  </a>
</p>
<p align="center">
  Figure 3. U-curves for DDQN and DeepVent policies. Observed mortality in function of the difference in actions between the RL algorithm and the physician (calculated as the bin of RL action - bin of physician action). Lowest mortality for DeepVent is observed when it picks actions in the same bin as physicians, strengthening the conclusion that DeepVent choosing actions close to physicians leads to higher survival.
</p>

- Figure 4: `compare_ood_id.py`
<p align="center">
    <img src="evaluation/original_graphs/ddqn_deepvent_ood_vs_id5runs.png" alt="ddqn_deepvent_ood_vs_id5runs" width="350" height="300">
  </a>
</p>
<p align="center">
  Figure 4. Mean initial Q-values for both in and out of distribution settings for DeepVent and DDQN (with variances - DeepVent's variance is not shown because it is too small). The horizontal line is the maximum expected return per episode. In contrast to DeepVent, DDQN clearly suffers from overestimation, which is aggravated in the OOD setting
</p>
<!-- 1. Copy the relative path (starting from the root of the repository) of the policy you want to evaluate (.pt file)
2. At the top of "evaluate_policy.py", set the POLICY_PATH variable to that relative path
3. Choose values for the other hyperparameters such as LEARNING_RATE, N_EPOCHS, etc. I have put default values which should be pretty okay but could be far from optimal so it's probably good to experiment
4. Run the file. The path to the weights for the Fitted Q Evaluation for each epoch will be output in the console. In the same folder as the policy weights you can find the csv files for all the metrics for the fitted Q evaluation at each epoch. To get the final metrics for a policy just take the metrics corresponding to the epoch of the fitted Q evaluation with the lowest loss. -->
