Machine Explanations and Human Understanding
Abstract: Explanations are hypothesized to improve human understanding of machine learning models and achieve a variety of desirable outcomes, ranging from model debugging to enhancing human decision making. However, empirical studies have found mixed and even negative results. An open question, therefore, is under what conditions explanations can improve human understanding and in what way. To address this question, we first identify three core concepts that cover most existing quantitative measures of understanding: task decision boundary, model decision boundary, and model error. Using adapted causal diagrams, we provide a formal characterization of the relationship between these concepts and human approximations (i.e., understanding) of them. The relationship varies by the level of human intuition in different task types, such as emulation and discovery, which are often ignored when building or evaluating explanation methods. Our key result is that human intuitions are necessary for generating and evaluating machine explanations in human-AI decision making: without assumptions about human intuitions, explanations may improve human understanding of model decision boundary, but cannot improve human understanding of task decision boundary or model error. To validate our theoretical claims, we conduct human subject studies to show the importance of human intuitions. Together with our theoretical contributions, we provide a new paradigm for designing behavioral studies towards a rigorous view of the role of machine explanations across different tasks of human-AI decision making.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: **Changes from the reviews (03/15/2023):** - Corrected and clarified Theorem 2. - Provided clarifications on the claim: "To the best of our knowledge, we are not aware of any existing quantitative behavioral measure of human understanding that does not belong to one of these three concepts of interest." - Adjusted our claims and added examples throughout section 5.1 to clarify how the proposed framework might be used to explain prior work. - Made additional framing changes and added citations in the introduction and conclusion. **Changes for the camera ready (04/18/2023):** - Clarified how human studies validate the proposed framework. - Added a paragraph in the introduction to clarify the limitation of our human studies. - Clarified in the first paragraph in experiment (section 6) on how human studies validate the proposed framework. - Clarified the bolded claim on page 8 to include the assumed lack of human intuition. - Added additional citations in Table 2.
Assigned Action Editor: ~Stefan_Lee1
Submission Number: 686