Discrimination Exposed? On the Reliability of Explanations for Discrimination Detection

Julian Skirzynski, David Danks, Berk Ustun

Published: 23 Jun 2025, Last Modified: 26 May 2025FAccT 2025EveryoneRevisionsCC BY 4.0

Abstract: Many rules and regulations in areas such as lending or hiring cast explanations as a safeguard against algorithmic discrimination. The underlying assumption is that, for a given model, individuals could inspect explanations of predictions to contest discriminatory outcomes or flag the model as biased. This is a common-sense assumption that is easy to comply with. However, it is also very difficult to corroborate because it relies on unverifiable causal assumptions about which variables constitute proxies, and how the proxies affect the outcome variable. In order to make accurate claims, individuals must assume what these relationships are and be able to detect the proxies and their influence on predictions from the explanations. In this work, we study whether explanations help users detect algorithmic discrimination. We formalize the problem of detecting discrimination and introduce a synthetic robot classification task with known discrimination labels, overcoming real-world limitations where ground truth is unknown. We then design a user study that validates participants’ understanding of explanations, protected attributes, proxies, and their causal strength, and isolates the utility of explanations. Our results show that human experts cannot reliably use explanations to flag discriminatory predictions irrespective of how much information about the predictions they have. Because the reliability of detection is low even under idealized conditions, these findings underscore the need for alternative anti-discrimination safeguards in practical settings.