Abstract: Training machine learning models for fair decisions faces two key challenges: The *fairness-accuracy trade-off* results from enforcing fairness which weakens its predictive performance in contrast to an unconstrained model. The incompatibility of different fairness metrics poses another trade-off – also known as the *impossibility theorem*. Recent work identifies the bias within the observed data as a possible root cause and shows that fairness and predictive performance are in accord when predictive performance is measured on unbiased data. We offer a causal explanation for these findings using the framework of the FiND (fictitious and normatively desired) world, a "fair" world, where protected attributes have no causal effects on the target variable. Our contribution is twofold: First, we unify insights from previously separate lines of research and establish a new theoretical link that demonstrates how both the fairness-accuracy and the trade-off between conflicting fairness metrics are naturally resolved in this FiND world. Second, we propose *appFiND*, a new method for evaluating the quality of the FiND world approximation via pre-processing in real-world scenarios where the true FiND world is not observable. In simulations and empirical studies, we demonstrate that these pre-processing methods are successful in approximating the FiND world and resolving both trade-offs. Our results provide actionable solutions for practitioners to achieve fairness and high predictive performance simultaneously.
Loading