Keywords: FairML, Sensitivity Analysis, Causal inference
TL;DR: We use causal sensitivity analysis to understand the effect of measurement bias on the evaluation of fairness metrics. We apply this to analyze the sensitivity of metrics over numerous benchmarks, drawing general conclusions for practitioners.
Abstract: Fairness metrics are a core tool in the fair machine learning literature (FairML),
used to determine that ML models are, in some sense, “fair.” Real-world data,
however, are typically plagued by various measurement biases and other violated
assumptions, which can render fairness assessments meaningless. We adapt tools
from causal sensitivity analysis to the FairML context, providing a general frame-
work which (1) accommodates effectively any combination of fairness metric and
bias that can be posed in the “oblivious setting”; (2) allows researchers to inves-
tigate combinations of biases, resulting in non-linear sensitivity; and (3) enables
flexible encoding of domain-specific constraints and assumptions. Employing this
framework, we analyze the sensitivity of the most common parity metrics under 3
varieties of classifier across 14 canonical fairness datasets. Our analysis reveals the
striking fragility of fairness assessments to even minor dataset biases. We show that
causal sensitivity analysis provides a powerful and necessary toolkit for gauging
the informativeness of parity metric evaluations. Our repository is \href{https://github.com/Jakefawkes/fragile_fair}{available here}.
Supplementary Material: pdf
Submission Number: 1286
Loading