# Data

The 2 datasets are provided in the datasets repository

- <b>German Credit Dataset</b>
    - Link: https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data
    - Features used: `duration`, `amount`, `age`, `personal_status_sex`.

- Small Business Dataset
    - Link: https://www.kaggle.com/datasets/larsen0966/sba-loans-case-data-set
    - Features used: `Zip`, `NAICS`, `ApprovalDate`, `ApprovalFY`, `Term`, `NoEmp`, `NewExist`, `CreateJob`, `RetainedJob`, `FranchiseCode`, `UrbanRural`, `RevLineCr`, `ChgOffDate`, `DisbursementDate`, `DisbursementGross`, `ChgOffPrinGr`, `GrAppv`, `SBA_Appv`, `New`, `RealEstate`, `Portion`, `Recession`, `daysterm`, `xx`.

### Preprocessing
We one-hot encode the non-numeric features, and then normalize the features of the real world data.


# Code

### Requirements
`requirements.txt` file is provided.

You can run `pip install requirements.txt` to install the dependencies required to run the experiments. 
To run the robustness vs consistency tradeoff experiments for linear setting `run_tradeoffs_cv_lr.ipynb`, you will first need to find the predictions by running `find_theta_r_linear.ipynb`, which will save the predictions into the `theta_preds` repository. 

### Experiments
The code to run the experiments are in notebooks that start with "run". You can run all the cells. At the end, it will save the generated results to the results directory. These saved results can then be visualized by running the notebooks that start with "results" followed by the corresponding experiment. 

Inside each of these notebooks, you can choose the dataset and base_model. 

Please note that the experiments to generate the effects of parameters will take around 10-12 hours to run. 