Exploring unfairness in Integrated Gradients based attribution methodsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Integrated Gradients, Expected Gradients, Explainable AI, Integrated Certainty Gradients, Attribution
Abstract: Numerous methods have attempted to explain and interpret predictions made by machine learning models in terms of their inputs. Known as “attribution methods” they notably include the Integrated Gradients method and its variants.These are based upon the theory of Shapley Values, a rigorous method of fair allocation according to mathematical axioms. Integrated Gradients has axioms derived from this heritage with the implication of a similar rigorous, intuitive notion of fairness. We explore the difference between Integrated Gradients and more direct expressions of Shapley Values in deep learning and find Integrated Gradients’ guarantees of fairness weaker; in certain conditions it can give wholly unrepresentative results. Integrated Gradients requires a choice of “baseline”, a hyperparameter that represents the ‘zero attribution’ case. Research has shown that baseline choice critically affects attribution quality, and increasingly effective baselines have been developed. Using purpose-designed scenarios we identify sources of inaccuracy both from specific baselines and inherent to the method itself, sensitive to input distribution and loss landscape. Failure modes are identified for baselines including Zero, Mean,Additive Gaussian Noise, and the state of the art Expected Gradients. We develop a new method, Integrated Certainty Gradients, that we show avoids the failures in these challenging scenarios. By augmenting the input space with “certainty”information, and training with random degradation of input features, the model learns to predict with varying amounts of incomplete information, supporting a zero-information case which becomes a natural baseline. We identify the axiomatic origin of unfairness in Integrated Gradients, which has been overlooked in past research.
One-sentence Summary: Failure cases of Integrated Gradients based attribution methods are identified and explained and a possible solution explored.
6 Replies

Loading