The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

Satyapriya Krishna; Tessa Han; Alex Gu; Steven Wu; Shahin Jabbari; Himabindu Lakkaraju

The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju

Published: 27 Jun 2024, Last Modified: 17 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We have incorporated changes suggested by both the reviewers and the Action Editor. Specifically, the changes we made in the draft are as follows: -- To address concerns about the age of the studied methods, we conducted additional experiments incorporating the Learning to eXplain (L2X) method. We've added these results to Appendix D.1, comparing L2X with our original six explanation methods across all metrics for the COMPAS dataset. Our findings show that L2X's disagreement with other methods is at least as high as the disagreement between our initially studied methods. -- We introduced a new "weighted rank agreement" metric to address reviewers’ concerns about the strictness of our original rank agreement metric. This softer version considers differences in ranks between the top-k features of any two given explanations and is detailed in Appendix D.2.1, with results shown in Figure 12. -- We computed rank correlation and pairwise rank agreement metrics on the top k features, as suggested by the reviewers. These new analyses are presented in Appendices D.2.2 and D.2.3, showing similar patterns of disagreement as our original results. We added a comparison of disagreement using the same number of top features (k = 1, 4, 7) for the COMPAS and German Credit datasets, included in Figure 2 and Figure 7 of the updated draft. -- To address concerns about dataset complexity, we conducted additional experiments using more complex tabular datasets: the Forest Cover Type (54 features) and Gas Concentration (128 features) datasets. The corresponding results can be found in Appendix D.3. We clarified our methodology for selecting the prompts used in our user study. See the third paragraph of Section 5.1 for more details. We've expanded our analysis of our user study by adding a new section (Appendix E.8) that breaks down results for academic and industry participants. This includes new figures showing how each group favors different explanations during disagreements, and tables analyzing the main themes in their decision-making processes. -- We have expanded our discussion in Section 6 to highlight the focus of our work on establishing the prevalence of explanation disagreement rather than exploring its underlying causes. We also included a discussion of follow-up work that examined the sources of explanation disagreements and prescribed systematic ways for choosing among explanations when such disagreements arise. We discussed the variability of our conclusions with respect to different disagreement metrics in Remark 1 (at the end of Section 5) as well as in the conclusion (Section 6). -- We corrected other inconsistencies and errors pointed out by the reviewers. (1) We fixed the number of predictive models mentioned in the abstract and conclusion. (2) We revised the discussion of our findings about Integrated Gradients, LIME, and KernelSHAP in Section 4.3.2

Code: https://github.com/AI4LIFE-GROUP/disagreement-problem

Assigned Action Editor: ~Jessica_Schrouff1

Submission Number: 2110

Loading