Abstract: Responsible use of machine learning requires models to be audited for undesirable properties. While a body of work has proposed using explanations for auditing, how to do so and why has remained relatively ill-understood. This work formalizes the role of explanations in auditing using inspirations from active learning and investigates if and how model explanations can help audits. As an instantiation of our framework, we look at `feature sensitivity' and propose explanation-based algorithms for auditing linear classifiers and decision trees for this property. Our results illustrate that Counterfactual explanations are extremely helpful for auditing feature sensitivity, even in the worst-case. While Anchor explanations and decision paths may not be as beneficial in the worst-case, in the average-case they do aid significantly as demonstrated by our experiments.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=gPtjyzXskg
Changes Since Last Submission: We have changed the abstract, introduction and conclusion to fix the scoping issues. We have included additional experiments for the zero feature of interest case, which were conducted during the rebuttal. Additionally, we have included an example of a non-testable auditing property, added a note on imperfect precision in the appendix, added a note on distribution-dependent properties, about creation of pairs, and fixed the typo in the algorithm found by one of the reviewers.
Supplementary Material: pdf
Assigned Action Editor: ~Gintare_Karolina_Dziugaite1
Submission Number: 2253
Loading