Keywords: deep learning, explainability, heatmaps, out-of-distribution detection
Abstract: Perturbation methods are model-agnostic methods used to generate heatmaps to explain black-box algorithms such as deep neural networks. Perturbation methods work by perturbing the input image. However, by perturbing parts of the input image we are changing the underlying structure of the image, potentially generating out-of-distribution (OOD) data. This would violate one of the core assumptions in supervised learning, namely that the train and test data come from the same distribution.
In this study, we coin the term hermitry ratio to quantify the utility of perturbation methods by looking at the amount of OOD samples they produce. Using this metric, we observe the utility of XAI methods (Occlusion analysis, LIME, Anchor LIME, Kernel SHAP) for image classification models ResNet50, DensNet121 and MnasNet1.0 on three classes of the ImageNet dataset. Our results show that, to some extent, \emph{all} four perturbation methods generate OOD data regardless of architecture or image class. Occlusion analysis primarily produces in-distribution perturbations while LIME produces mostly OOD perturbations.
One-sentence Summary: Evaluating the validity of perturbation methods for explainable deep learning
11 Replies
Loading