Keywords: explainability, selective prediction, amortization, interpretability, shapley
TL;DR: We propose selective explanations to detect when amortized explainers produce low-quality explanations and introduce an optimized method to improve their quality.
Abstract: Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features.
These methods can be computationally expensive for large ML models. To address this challenge, there have been increasing efforts to develop amortized explainers, where a ML model is trained to efficiently approximate computationally expensive feature attribution scores. Despite their efficiency, amortized explainers can produce misleading explanations. In this paper, we propose selective explanations to (i) detect when amortized explainers generate inaccurate explanations and (ii) improve the approximation of the explanation using a technique we call explanations with initial guess. Selective explanations allow practitioners to specify the fraction of samples that receive explanations with initial guess, offering a principled way to bridge the gap between amortized explainers (one inference) and more computationally costly approximations (multiple inferences). Our experiments on various models and datasets demonstrate that feature attributions via selective explanations strike a favorable balance between explanation quality and computational efficiency.
Supplementary Material: zip
Primary Area: Interpretability and explainability
Submission Number: 12813
Loading