TL;DR: We give formal guarantees for feature attribution explainability methods.
Abstract: Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothing method called Multiplicative Smoothing (MuS) to achieve such a model. We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with various feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.
Submission Track: Full Paper Track
Application Domain: None of the above / Not applicable
Clarify Domain: We broadly study feature attribution methods, but many examples are inspired from computer vision.
Survey Question 1: We study how to build useful and reliable explainability methods with a focus on binary-valued feature attributions. To do this, we mathematically formalize the concept of stability and present a computationally efficient and model-agnostic method, MuS, for achieving variants of this property. We show that MuS attains non-trivial guarantees on existing and popular models and feature attribution methods --- all without much custom engineering.
Survey Question 2: We are interested in providing formal guarantees for explainability methods, in particular, binary-valued feature attributions. We study formal guarantees because they provide a means for users to reliably use and predict how an explanation should behave. For feature attributions especially, this allows for a greater interpretability of what can be expected of the "important" features.
Survey Question 3: As our work studies feature attributions in general, we use LIME, SHAP, Integrated Gradients, and vanilla gradient saliency as some representative benchmarks.
Submission Number: 56
Loading