Abstract: As the visual interpretations for convolutional neural networks (CNNs), backpropagation attribution methods have been garnering growing attention. Nevertheless, majority of those methods merely concentrate on the ultimate convolutional layer, leading to tiny and concentrated interpretations that fail to adequately clarify the model-central attention. Therefore, we propose a precise attribution method (i. e., Holistic-CAM) for high-definition visual interpretation in the holistic stage of CNNs. Specifically, we first present weighted positive gradients to guarantee the sanity of interpretations in shallow layers and leverage multi-scale fusion to improve the resolution across the holistic stage. Then, we further propose fundamental scale denoising to eliminate the faithless attribution originated from fusing larger-scale components. The proposed method is capable of simultaneously rendering fine-grained and faithful attribution for CNNs from shallow to deep layers. Extensive experimental results demonstrate that Holistic-CAM outperforms state-of-the-art methods on common-used benchmarks, including deletion and insertion, energy-based point game as well as remove and debias on ImageNet-1k, it also passes the sanity check easily.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Multimedia interpretation is of great importance which aids human to trust and make a better utilization of media-related models. We propose a novel visual interpretation method (i. e., Holistic-CAM). Our method tackles the deficiency existing interpretation methods and is capable of providing interpretations in a lucid and understandable manner. This will enable human to better comprehend and articulate the operational procedures and reasoning behind the decision-making of media-related deep-learning models.
Supplementary Material: zip
Submission Number: 4968
Loading