Deriving Explainable Discriminative Attributes Using Confusion About Counterfactual ClassDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 29 Jan 2024ICASSP 2022Readers: Everyone
Abstract: Recently, Integrated Gradients-based (IG) methods have been commonly used to explain the decision process of deep neural networks (DNNs). However, they have only considered the information of the predicted class while neglecting the in-formation of the rest classes. In this paper, we propose a novel counterfactual explanation method, Discriminative Gradients (DiscGrad) that derives explainable discriminative attributes by considering not only the predicted class but also the counterfactual classes. Specifically, we calculate the discriminative attributes by removing the attribute of the counterfactual classes, and this process makes it possible to derive only key discriminative attributes that contrast with other decisions. Also, we determine the weights for discriminative attributes using the degree of confusion about counterfactual classes. We evaluated our method by measuring how much logit decreases by perturbing important attributes. Experimental results on the widely used image and text datasets show that our proposed method outperforms the strong baseline, IG. In addition, we examine the relationship between class correlation and the performance of discriminative attribute to demonstrate the effectiveness of our method.
0 Replies

Loading