Keywords: Feature Attribution, Explainable AI, Banzhaf Value, Probabilistic Value
TL;DR: We propose Dynamic Banzhaf, a game-theoretic attribution method that replaces fixed coalition sampling with optimization of each feature’s masking probability to a user-defined objective function
Abstract: Game-theoretic attribution methods approximate target model as a cooperative game and evaluate feature importance as payoff allocation to the input features. Most methods use well-known game-theoretic solutions such as the Shapley value because they satisfy key desirable axioms. However, the strict assumptions of game theory reduce the flexibility of explanations: in particular, most methods use fixed coalition sampling distributions, preventing the dynamic alignment of explanations to user criteria. To address this gap, we introduce Dynamic Banzhaf, a game-theoretic attribution method that optimizes the masking probability of each feature to a user-defined objective function. We provide theoretical proof on the convergence of Dynamic Banzhaf, discuss optimal probability selection, and empirically demonstrate the effect of probability adjustment on the quality of the explanations in machine learning models. Our results indicate that masking probabilities can be calibrated to improve the alignment of explanations to user criteria, highlighting the effect of dynamic probability selection in game-theoretic attribution.
Primary Area: interpretability and explainable AI
Submission Number: 10408
Loading