TL;DR: We answer whether post hoc explainers can faithfully explain white box models by comparing explanations to analytically-derived ground truth
Abstract: Surging interest in deep learning from high-stakes domains has precipitated concern over the inscrutable nature of black box neural networks. Explainable AI (XAI) research has led to an abundance of explanation algorithms for these black boxes. Such post hoc explainers produce human-comprehensible explanations, however, their fidelity with respect to the model is not well understood - explanation evaluation remains one of the most challenging issues in XAI. In this paper, we ask a targeted but important question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors? Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model. We demonstrate the efficacy of our approach in understanding these explainers applied to symbolic expressions, neural networks, and generalized additive models on thousands of synthetic and several real-world tasks. Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.
Submission Track: Full Paper Track
Application Domain: None of the above / Not applicable
Clarify Domain: We touch both computer vision and tabular data modalities, but the paper concerns general evaluation of post hoc explainers
Survey Question 1: Since AI models are commonly uninterpretable black boxes, we use explanation methods (explainers) to generate human-comprehensible explanations to gain insight into their decision-making processes. However, it is very challenging to evaluate how faithful these explanations are to the model being explained. Our work studies how well a popular class of explainers explain models where we know exactly what the ground truth should be.
Survey Question 2: The motivation to study explainability is to understand how faithful a certain popular explanation scenario (post hoc explainability) is for realistic application domains, including vision and tabular data tasks. The limitations are twofold: (1) not using any explainability in these domains can lower user trust and limit human intervention/steerability, and (2) using explainability without validating its fidelity with respect to the model can mislead users and cause harm regardless of user trust.
Survey Question 3: We use LIME, SHAP, MAPLE, Partial Dependence Plots (PDPs), and SHAPR (SHAP extension to handle when features are dependent)
Submission Number: 2
Loading