# Auditing Predictive Models for Intersectional Biases

This repository the supplementary material needing to reproduce the methods and findings in Auditing Predictive Models for Intersectional Biases. This repository specifically contains the code for (1) Conditional Bias Scan (2) the Simulations (3) Permutation Testing

## Requirements

We provide package requirements for an Anaconda environment that would be able to run all the coded we provide in 'environment.yml'.  We specifically used an Anaconda environment for Python 3.7.4.

## Running Conditional Bias Scan (CBS) on COMPAS Data

To run CBS on the COMPAS data please run the commands below:

```run CBS for COMPAS
cd Runner_files
python Runner-COMPAS.py
```

The COMPAS data is located at 'data/COMPAS.csv'.  This is the code and dataset used to produce information in Section 5 (Case Study of COMPAS).

To modify CBS for other dataset the settings need to be changed in the following files:
1. settings for dataset and how to preprocess it (observed outcome, scores, MLE, etc): 'dataset_specific_yamls/compas.yaml'
2. settings for CBS run (scan types, value-conditional info, direction, penalty, iterations, threshold information): 'fsscan_yamls/fsscan_configs-COMPAS.yaml'

The full results will be generated to 'Runner_files/results'.

We provide all the settings files, data files, and generated risk predictions for German Credit Data. The code changes necessary to run CBS for this data are a matter of changing the file paths in the code to the equivalent file in the same directories for German Credit data.



## Evaluation (Simulation Tests)

To run all the simulations in the Section 4 (Evaluation) and in Appendices B.3 and B.4 use the code below:

```run simulation tests
cd Benchmark_Tests
python simulation_general_mu_suff-for-HPC_non_additive_shift.py
sbatch cbs-run.sbatch
```

To modify the simulations for other dataset the settings need to changed in the following files:
1. how to preprocess the data (observed outcome, scores, MLE, etc): 'dataset_specific_yamls/compas-benchmark.yaml'
2. settings for CBS run (scan types, value-conditional info, direction, penalty, iterations, threshold information): 'fsscan_yamls/fsscan_configs-CBS_benchmark.yaml'

Please note that certain modifications needed to made to CBS to run these simulations, such as reading in the dataset directly in the code and defining the protected class in the code rather than in configuration files. Also, the code we used for competing methods is not included because it is not original code, but the modifications their methods described in Appendix B.1 are detailed enough to reproduce their simulations using their source code on Github.  Lastly, this is coded to run of an HPC instance.  The python file generates the data, and the sbatch file runs the singular-task-for-HPC python file

The full results will be generated to 'Benchmark_Tests/Benchmark_results'.

To run the simulations presented in Appendix B.3 specifically for the additive signal for the log-odds perform the following (which is not a HPC implementation):

```run simulation tests
cd Benchmark_Tests
python simulation_general_mu_suff.py
```

For the code for the Benchmark methodologies, refer to the following Github repositories: 
GerryFair - https://github.com/algowatchpenn/GerryFair
MultiAccuracy Boost - https://github.com/amiratag/MultiAccuracyBoost

## Permutation Testing

To run the permutation testing on COMPAS and German Credit Data that was used to get statistically significant indicators in Section 5 and Appendix C.2 (Case Study of COMPAS, Case Study of German Credit) and described in Appendix A.3 use the code below:

```run permutation tests
cd Permutation_Testing
python permutation_testing.py
```

To modify the permutation testing for other dataset the settings need to be changed in the following files:
1. settings for dataset and how to preprocess it (observed outcome, scores, MLE, etc): 'dataset_specific_yamls/compas.yaml'
2. settings for CBS run (scan types, value-conditional info, direction, penalty, iterations, threshold information): 'fsscan_yamls/fsscan_configs-COMPAS-permutation_testing.yaml'

Please note that certain modifications needed to made to CBS to run these simulations, defining the protected class in the code (prior to permutating) rather than in configuration files.

We provide all the settings files, data files, and generated risk predictions for German Credit Data. The code changes necessary to run CBS for this data are a matter of changing the file paths in the code to the equivalent file in the same directories for German Credit data.


The full results will be generated to 'Permutation_Testing/permutation_testing_results'.

## Data

All data used for our research in the data folder.  To find the original datasets refer to the following links: 

COMPAS - https://github.com/propublica/compas-analysis
German Credit Data - https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data
