Do Explanations Expose Bias? How Saliency Maps Affect Judgements of Biased Face-Recognition Models

Justyn Rodrigues, Krista A. Ehinger, Oliver Obst, X. Rosalind Wang

Published: 21 Oct 2025, Last Modified: 08 Feb 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Saliency-map explanations are intended to make computer-vision models more transparent, but it is unclear whether they help people recognise biased behaviour. We conducted a controlled on-line study with 40 participants who compared Layer-wise Relevance Propagation maps from convolutional face-recognition models. A fair model was trained on a balanced synthetic dataset; two biased models were trained on data in which either light- or dark-skinned faces appeared only in frontal pose. Each participant completed 32 comparison trials. When the fair model was paired with the dark-skinned-pose-biased model, selections were near chance (52.8% favouring the fair model, binomial p = .36). When the fair model was paired with the light-skinned-pose-biased model, participants chose the biased model significantly more often (58.1%, p = .005). Confidence ratings varied with condition and did not systematically track model fairness. These results indicate that pixel-level attribution alone does not reliably expose training bias and can, in some settings, mislead non-expert users.
Loading