Sampling Matters in Explanations: Towards Robust Attribution Analysis with Feature Suppression

TMLR Paper629 Authors

23 Nov 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Pixel-wise image attribution analysis seeks to highlight a subset of the semantic features from inputs and such a subset can reflect the interactions between the features and their inferences. The gradient maps of decision risk values with respect to the inputs can highlight a fraction of the interactive features relevant to inferences. Gradient integration is a pixel-wise attribution approach by sampling multiple samples from the given inputs and then summing the derived gradient maps from the samples as explanations. Our theoretical analysis demonstrate that the alignment of the sampling distribution can delimit the upper bound of explanation certainty. Prior works leverage some normal or uniform distribution for sampling and the misalignment of their distributions can thus lead to low explanation certainty. Furthermore, their explanations can fail if models are trained with data augmentation due to the skewed distribution. We present a semi-ideal sampling approach to improve the explanation certainty by simply suppressing features. Such an approach can align with the natural image feature distribution and preserve intuition-aligned features without adding agnostic information. Further theoretical analysis from the perspective of cooperative game theory also shows that our approach is in fact equivalent to an estimation of Shapley values. The extensive quantitative evaluation on ImageNet can further affirm that our approach is able to yield more satisfactory explanations by preserving more information against state-of-the-art baselines.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Jessica_Schrouff1
Submission Number: 629
Loading