Keywords: explainable AI, black-box explainability, post-hoc explanations, CNN
TL;DR: The paper presents a framework, called activation-deactivation, that replaces masking values in perturbations of inputs with deactivation of the relevant elements of the neural network.
Abstract: Black-box explainability methods are popular tools for explaining the decisions
of image classifiers. A major drawback of these tools is their reliance on mutants
obtained by occluding parts of the input, leading to out-of-distribution images.
This raises doubts about the quality of the explanations. Moreover, choosing an
appropriate occlusion value often requires domain knowledge. In this paper we
introduce a novel forward-pass paradigm Activation-Deactivation (AD), which
removes the effects of occluded input features from the model’s decision-making
by switching off the parts of the model that correspond to the occlusions.
We introduce CONVAD, a drop-in mechanism that can be easily added to any trained Con-
volutional Neural Network (CNN), and which implements the AD paradigm. This
leads to more robust explanations without any additional training or fine-tuning.
We prove that CONVAD mechanism does not change the decision-making process
of the network. We provide experimental evaluation across several datasets and
model architectures. We compare the quality of AD-explanations with explana-
tions achieved using a set of masking values, using the proxies of robustness, size,
and confidence drop-off. We observe a consistent improvement in robustness of
AD explanations (up to 62.5%) compared to explanations obtained with
occlusions, demonstrating that CONVAD extracts more robust explanations without the
need for domain knowledge.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 7985
Loading