Abstract: Feature attribution explains machine decisions by quantifying each feature's contribution.
While numerous approaches rely on exact gradient measurements, recent work has adopted gradient estimation to derive explanatory information under query-level access, a restrictive yet more practical accessibility assumption known as the black-box setting.
Following this direction, this paper introduces GEFA (Gradient-estimation-based Explanation For All), a general feature attribution framework leveraging proxy gradient estimation.
Unlike the previous attempt that focused on explaining image classifiers, the proposed explainer derives feature attributions in a proxy space, making it generally applicable to arbitrary black-box models, regardless of input type.
In addition to its close relationship with Integrated Gradients, our approach, a path method built upon estimated gradients, surprisingly produces unbiased estimates of Shapley Values.
Compared to traditional sampling-based Shapley Value estimators, GEFA avoids potential information waste sourced from computing marginal contributions, thereby improving explanation quality, as demonstrated in quantitative evaluations across various settings.
Lay Summary: Understanding why and how AI models make decisions is crucial, especially in critical scenarios such as healthcare or finance.
This work aims to bring transparency to AI-supported systems by uncovering which parts of the input the AI is "looking at" when making up its mind -- a fundamental step toward the explainability of AI behavior.
To achieve this, we analyze the observations of AI outcomes and summarize the patterns to determine the supporting evidence. Further endeavors are devoted to improving the explanation process for efficiency and reliability. As a result, we develop GEFA, a general-purpose tool for explaining AI decisions. The proposed solution is designed to work across various types of models and data, enhancing its generalizability. We believe that GEFA can serve as a valuable tool to support tasks such as debugging AI models, monitoring their behavior, and guiding future developments with more insights into the existing AI systems.
Link To Code: https://github.com/caiy0220/GEFA
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: Explainable AI, Feature Attribution, Gradient Estimation, Shapley Value
Submission Number: 7461
Loading