Keywords: Interpretability, Alignment
TL;DR: We develop and leverage faithful interpretability along with core-region masks to improve feature alignment in image classification tasks.
Abstract: Despite the ubiquity of modern deep learning, accurate explanations of network predictions remain largely elusive. HiResCAM is a popular interpretability technique used to visualize attention maps (i.e., regions-of-interest) over input images. In this paper, we theoretically show a limitation of HiResCAM: the HiResCAMs for a given input are not uniquely determined, allowing an arbitrary spurious shift by a common matrix $M$ while corresponding to the same prediction. We further propose *ContrastiveCAMs*, which are invariant to the spurious shift $M$ hence improving robustness of explanations, while additionally providing granular class-versus-class explanations. With the additional granular explanations, experiments reveal that networks often focus on regions unrelated to the class label. To address this issue, we leverage the knowledge of core image regions and propose *Core-Focused Cross-Entropy*, an extension of cross entropy, which encourages attention on core regions while suppressing unrelated regions, improving feature alignment. Experiments on Hard-ImageNet and Oxford-IIIT Pets show that ContrastiveCAM provides more faithful attention maps and our method effectively improves feature alignment by primarily extracting predictive performance from core image regions.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 14507
Loading