Keywords: saliency, attribution, interpretability, sanity checks
TL;DR: We devise a mechanism called competition among pixels that allows (approximately) complete saliency methods to pass the sanity checks.
Abstract: {\em Saliency methods} attempt to explain a deep net's decision by assigning a {\em score} to each feature/pixel in the input, often doing this credit-assignment via the gradient of the output with respect to input.
Recently \citet{adebayosan} questioned the validity of many of these methods since they do not pass simple {\em sanity checks}, which test whether the scores shift/vanish when layers of the trained net are randomized, or when the net is retrained using random labels for inputs. % for the inputs. %Surprisingly, the tested methods did not pass these checks: the explanations were relatively unchanged.
We propose a simple fix to existing saliency methods that helps them pass sanity checks, which we call {\em competition for pixels}. This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map. Some theoretical justification is provided for it and its performance is empirically demonstrated on several popular methods.
Original Pdf: pdf
7 Replies
Loading