Shapley Values of Structured Additive Regression Models and Application to RKHS Weightings of Functions

TMLR Paper3300 Authors

06 Sept 2024 (modified: 28 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The ability to interpret machine learning models is proving more and more invaluable, as their use in sensitive domains requires trust. Therefore, work to improve explanation methods, especially the interpretation of complex models, is of high importance. With this in mind, the purpose of this paper is twofold. First, we present an algorithm for efficiently calculating the Shapley values of a family of models, Structured Additive Regression (STAR) models, which allow more variable interactions than Generalized Additive Models (GAMs). Second, we present a new instantiation in the RKHS Weightings of Functions paradigm, better adapted to regression, and show how to transform it and other RKHS Weightings instantiations into STAR models. We therefore introduce a new family of STAR models, as well as the means to interpret their outputs in a timely manner.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: The following sections have been added or modified: 1 Introduction: the last three paragraphs have been reworked. 2.2 Shapley values: Major additions to the section. 2.3 RKHS Weightings: Added clarifications to better define RKHS Weightings. 3 Shapley values of STAR models: The final paragraph before 3.1 explains the algorithmic complexity of our algorithm. 3.1 Comparison to other Shapley value algorithms: New section comparing our algorithm to others. 3.2 Applications of STAR-SHAP: New section describing models compatible with our algorithm. 5.1 Generating synthetic datasets: New section explaining how we generated random models and datasets. 5.2 Computation time of the Shapley values: Added multiple algorithms to the experiment. Added a Figure 1b. Adujusted the text in consequence. Added another experiment (Figure 2) which examines the quality of the Shapley values returned with regard to the compution time. 5.4 Time series prediction performance: The final paragraph is new, and places this experiment in the context of future research. 5.5 Shapley values comparison: New experiment (Figures 3 and 4). Comparison of the actual Shapley values of an Explainable Boosting Machine and a STAR RKHS Weighting. 5.6 Discussion: New section summarizing the conclusions we take from the experiments. We improved the language and fixed other mistakes, and added new citations where relevant.
Assigned Action Editor: ~Dennis_Wei1
Submission Number: 3300
Loading