The code is for paper submission: Behavior-Aware Off-Policy Selection in High-Stake Human-Centric Environments

This folder contains the code to (1) prepare data to be trained for capturing critical behavioral patterns; (2) adaptive policy evaluation and selection with less uncertainty.

We included the data generation code, and training and evaluation code of HBO for the sepsis treatment experiments.

Folders/Files (under sepsis)
HBO_sepsis.ipynb: HBO algorithm using critical behavioral patterns for online evaluation
learn_policies_varied_target_policy.ipynb: code from Namkoong et al (2020) for learning policies. 

Folders (under edu)
HBO_edu.ipynb: HBO algorithm using critical behavioral patterns for online evaluation
augmented_data -- contains the generated synthetic trajectories
processed_data -- contains the prepared data in sharable format
raw_data -- contains original trajectories
saved_augmented_data -- contains augmented trajectories
saved_dist -- stores (pre-trained) policy checkpoints that can be loaded as policy
saved_models -- stores (pre-trained) checkpoints that can be loaded as trajectory generation models
model -- stores policies
cluster_data -- stores subgroup-specific data


############################################
Experiments
############################################

Step1. Execute learn_policies_varied_target_policy.ipynb
Step2. Execute HBO_sepsis.ipynb

Dependencies:
Python 3
tensorflow 1.15.0
gym 0.21.0
numpy 1.21.2
pandas 1.3.5
csv 1.0
sklearn 1.0.2



****************************************************************************************************
----------------------------------------------------------------------------------------------------
Due to the IRB protocol, either students original data or anonymized data derived from original data is not included in this folder.
----------------------------------------------------------------------------------------------------
****************************************************************************************************
