Abstract: In this work, we instantiate a novel perturbation-based multi-class explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation). We demonstrate that LIPEx not only locally replicates the probability distributions output by the widely used complex classification models but also provides insight into how every feature deemed to be important affects the prediction probability for each of the possible classes. We achieve this by defining the explanation as a matrix obtained via regression in the space of probability distributions, with respect to the Hellinger distance. Ablation tests on text and image data, show that LIPEx-guided removal of important features from the data causes more change in predictions for the underlying model than similar tests based on other saliency-based or feature importance-based Explainable AI (XAI) methods. It is also shown that compared to LIME, LIPEx is more data efficient in terms of using a lesser number of perturbations of the data to obtain a reliable explanation. This data-efficiency is seen to manifest as LIPEx being able to compute its explanation matrix ~53% faster than all-class LIME, for classification experiments with text data.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In response to the second round of reviewer comments we have explicit demonstrations in Appendix D.2 about why multi-class LIME answers cannot be stacked into a matrix and be easily interpreted as is natural for our LIPEx method. (and fixed some typos)
Assigned Action Editor: ~Sanghyuk_Chun1
Submission Number: 1891
Loading