TL;DR: We find out that the selection of examples used to present model explanations to humans has a significant impact on their trust in the model, potentially distorting auditing outcomes.
Abstract: In model audits explainable AI (XAI) systems are usually presented to human auditors on a limited number of examples due to time constraints. However, recent literature has suggested that in order to establish trust in ML models, it is not only the model’s overall performance that matters but also the specific examples on which it is correct. In this work, we study this hypothesis through a controlled user study with N = 320 participants. On a tabular and an image dataset, we show model explanations to users on examples that are categorized as ambiguous or unambiguous. For ambiguous examples, there is disagreement on the correct label among human raters whereas for unambiguous examples human labelers agree. We find that ambiguity can have a substantial effect on human trust, which is however influenced by surprising interactions of the data modality and explanation quality. While unambiguous examples boost trust for explanations that remain plausible, they also help auditors identify highly implausible explanations, thereby decreasing trust. Our results suggest paying closer attention to the selected examples in the presentation of XAI techniques.
Submission Track: Full Paper Track
Application Domain: Computer Vision
Clarify Domain: We additionally study an application in criminal justice.
Survey Question 1: In model audits, interpretability has been shown to be helpful for auditors to become aware of potential biases and unwanted behavior. In the present work, we however show that caution has to be applied when crucial decisions such as audit approvals are bases on a limited number of examples.
Survey Question 2: Without interpretability, auditors can struggle to uncover potential biases or flaws in ML models.
Survey Question 3: We employ SHAP for tabular data, and GradCAM for image data.
Submission Number: 46
Loading