# General description

The computation of Banzhaf and Shapley values is implemented in C++.
See shap_banzhaf.cpp for general usage and shap_banzhaf_symmetric.cpp for synthetic symmetric trees.
The plots and experiments are generated in ipython/python:
- ipython/lib/run_all.py for summary plots for datasets also for "bad" examples, and for MAE and RMSE distances. 
- shap_running_times.ipynb for running times table and plots
- shap_numerical_errors.ipynb for numerical errors plots


# Prerequisites
Python 3.7.8
The list of packages is in requirements.txt
clang version 6.0.0-1ubuntu2


# Build:
$mkdir build
$cd build
$rm -rf *; cmake -D CMAKE_CXX_COMPILER=clang++ -D CMAKE_BUILD_TYPE=Release -D CMAKE_VERBOSE_MAKEFILE=true .. && make

# Usage
Run
./shap_banzhaf 
./shap_banzhaf_symmetric
to get the usage.

In the most common usage (compute Banzhaf - the fastest version) one run
 ./shap_banzhaf f bst_boston.file boston.csv
where:

bst_boston.file contains trees in xgboost txt format
boston.csv is the csv containing the data points
The results will be stored in data/boston/bst_boston.file.banzhaf_fast

# Datasets
The smaller datasets are included in the repository.
To run experiments one need to download the largest dataset Flights.
Links:
- boston https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
- health insurance https://www.kaggle.com/anmolkumar/health-insurance-cross-sell-prediction?select=train.csv
- flights https://www.kaggle.com/abdurrehmankhalid/delayedflights
- nhanes https://github.com/suinleelab/treeexplainer-study

# Remark
The ``shap'' directory contains a copy of the public repository https://github.com/slundberg/shap/.
We use the (slightly adjusted) C implementation of TREESHAP_PATH algorithm from that repository
(from the file shap/cext/tree_shap.h) as the ``shap_orig'' implementation.

