GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation

Yi Cai; Thibaud Ardoin; Gerhard Wunder

GEFA: A General Feature Attribution Framework Using Proxy Gradient Estimation

Yi Cai, Thibaud Ardoin, Gerhard Wunder

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Feature attribution explains machine decisions by quantifying each feature's contribution. While numerous approaches rely on exact gradient measurements, recent work has adopted gradient estimation to derive explanatory information under query-level access, a restrictive yet more practical accessibility assumption known as the black-box setting. Following this direction, this paper introduces GEFA (Gradient-estimation-based Explanation For All), a general feature attribution framework leveraging proxy gradient estimation. Unlike the previous attempt that focused on explaining image classifiers, the proposed explainer derives feature attributions in a proxy space, making it generally applicable to arbitrary black-box models, regardless of input type. In addition to its close relationship with Integrated Gradients, our approach, a path method built upon estimated gradients, surprisingly produces unbiased estimates of Shapley Values. Compared to traditional sampling-based Shapley Value estimators, GEFA avoids potential information waste sourced from computing marginal contributions, thereby improving explanation quality, as demonstrated in quantitative evaluations across various settings.

Lay Summary: Understanding why and how AI models make decisions is crucial, especially in critical scenarios such as healthcare or finance. This work aims to bring transparency to AI-supported systems by uncovering which parts of the input the AI is "looking at" when making up its mind -- a fundamental step toward the explainability of AI behavior. To achieve this, we analyze the observations of AI outcomes and summarize the patterns to determine the supporting evidence. Further endeavors are devoted to improving the explanation process for efficiency and reliability. As a result, we develop GEFA, a general-purpose tool for explaining AI decisions. The proposed solution is designed to work across various types of models and data, enhancing its generalizability. We believe that GEFA can serve as a valuable tool to support tasks such as debugging AI models, monitoring their behavior, and guiding future developments with more insights into the existing AI systems.

Link To Code: https://github.com/caiy0220/GEFA

Primary Area: Social Aspects->Accountability, Transparency, and Interpretability

Keywords: Explainable AI, Feature Attribution, Gradient Estimation, Shapley Value

Submission Number: 7461

Loading