Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Shapley values, Regression-adjustment, Kernel SHAP, Maximum sample reuse, Permutation SHAP
TL;DR: We combine Monte Carlo and regression-based methods to get a flexible estimator which achieves state-of-the-art performance.
Abstract: With origins in game-theory, probabilistic values like Shapley values, Banzhaf values, and semi-values have emerged as a central tool in explainable AI. They are used for feature attribution, data attribution, data valuation, and more. Since all of these values require exponential time to compute exactly, research has focused on efficient approximation methods using two techniques: Monte Carlo sampling and linear regression formulations. In this work, we present a new way of combining both of these techniques. Our approach is more flexible than prior algorithms, allowing for linear regression to be replaced with any function family whose probabilistic values can be computed efficiently. This allows us to harness the accuracy of tree-based models like XGBoost, while still producing unbiased estimates. From experiments across eight datasets, we find that our methods give state-of-the-art performance for estimating probabilistic values. For Shapley values, the error of our methods is up to $6\times$ lower than Permutation SHAP (the most popular Monte Carlo method), $2.75\times$ lower than Kernel SHAP (the most popular linear regression method), and $1.75\times$ lower than Leverage SHAP (the prior state-of-the-art Shapley value estimator). For more general probabilistic values, we can obtain error up to $60\times$ lower than prior work.
Supplementary Material: zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 8399
Loading