Abstract: In this work, we instantiate a perturbation-based multi-class explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation). We demonstrate that LIPEx not only locally replicates the probability distributions output by the widely used complex classification models but also provides insight into how every feature deemed to be important affects the prediction probability for each of the possible classes. We achieve this by defining the explanation as a matrix obtained via regression in the space of probability distributions, with respect to the Hellinger distance. Ablation tests on text and image data, show that LIPEx-guided removal of important features from the data causes more change in predictions for the underlying model than similar tests based on other saliency-based or feature importance-based Explainable AI (XAI) methods. It is also shown that compared to LIME, LIPEx is more data efficient in terms of using a lesser number of perturbations of the data to obtain a reliable explanation. This data-efficiency is seen to manifest as LIPEx being able to compute its explanation matrix 53% faster than all-class LIME, for classification experiments with text data.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=W11uHaXw06&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: - The comparison to related work has been significantly revamped.
- Compared to the first submission to TMLR the following experiments have been added,
- a visual comparison of LIPEx against multi-class LIME
- a check that LIPEx can reproduce human annotations better than LIME
- experiments with ViT
- the red curve in Figure 6
- the LIME column in Figure 2
- Various other linguistic edits done.
Assigned Action Editor: ~Sanghyuk_Chun1
Submission Number: 2289
Loading