Evaluations and Methods for Explanation through Robustness Analysis

Cheng-Yu Hsieh; Chih-Kuan Yeh; Xuanqing Liu; Pradeep Ravikumar; Seungyeon Kim; Sanjiv Kumar; Cho-Jui Hsieh

Evaluations and Methods for Explanation through Robustness Analysis

Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We propose new objective measurement for evaluating explanations based on the notion of adversarial robustness. The evaluation criteria further allows us to derive new explanations which capture pertinent features qualitatively and quantitatively.

Abstract: Among multiple ways of interpreting a machine learning model, measuring the importance of a set of features tied to a prediction is probably one of the most intuitive way to explain a model. In this paper, we establish the link between a set of features to a prediction with a new evaluation criteria, robustness analysis, which measures the minimum tolerance of adversarial perturbation. By measuring the tolerance level for an adversarial attack, we can extract a set of features that provides most robust support for a current prediction, and also can extract a set of features that contrasts the current prediction to a target class by setting a targeted adversarial attack. By applying this methodology to various prediction tasks across multiple domains, we observed the derived explanations are indeed capturing the significant feature set qualitatively and quantitatively.

Keywords: Interpretability, Explanations, Adversarial Robustness

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/evaluations-and-methods-for-explanation/code)

Original Pdf: pdf

11 Replies

Loading