Abstract: Saliency-map explanations are intended to make computer-vision models more transparent, but it is unclear whether they help people recognise biased behaviour. We conducted a controlled on-line study with 40 participants who compared Layer-wise Relevance Propagation maps from convolutional face-recognition models. A fair model was trained on a balanced synthetic dataset; two biased models were trained on data in which either light- or dark-skinned faces appeared only in frontal pose. Each participant completed 32 comparison trials. When the fair model was paired with the dark-skinned-pose-biased model, selections were near chance (52.8% favouring the fair model, binomial p = .36). When the fair model was paired with the light-skinned-pose-biased model, participants chose the biased model significantly more often (58.1%, p = .005). Confidence ratings varied with condition and did not systematically track model fairness. These results indicate that pixel-level attribution alone does not reliably expose training bias and can, in some settings, mislead non-expert users.
External IDs:doi:10.3233/faia250936
Loading