Keywords: Interpretability, Explainability
Abstract: Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to support model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasonable explanations to users. Specifically, we adopted three tests to evaluate whether the relevance metrics satisfy the minimal requirements for similarity-based explanation. Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. In addition, we showed that some metrics perform poorly in our tests and analyzed the reasons of their failure. We expect our insights to help practitioners in selecting appropriate relevance metrics and also aid further researches for designing better relevance metrics for explanations.
One-sentence Summary: We investigated empirically which of the relevance metrics (e.g. similarity of hidden layer, influence function, etc.) are appropriate for similarity-based explanation.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Code: [![github](/images/github_icon.svg) k-hanawa/criteria_for_instance_based_explanation](https://github.com/k-hanawa/criteria_for_instance_based_explanation) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=9uvhpyQwzM_)
Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [MNIST](https://paperswithcode.com/dataset/mnist)