Abstract: The use of machine learning (ML) in high-stakes societal decisions
has encouraged the consideration of fairness throughout the ML
lifecycle. Although data integration is one of the primary steps to
generate high-quality training data, most of the fairness literature
ignores this stage. In this work, we consider fairness in the integra-
tion component of data management, aiming to identify features
that improve prediction without adding any bias to the dataset.
We work under the causal fairness paradigm [45 ]. Without requir-
ing the underlying structural causal model a priori, we propose
an approach to identify a sub-collection of features that ensure
fairness of the dataset by performing conditional independence
tests between different subsets of features. We use group testing to
improve the complexity of the approach. We theoretically prove the
correctness of the proposed algorithm and show that sublinear con-
ditional independence tests are sufficient to identify these variables.
A detailed empirical evaluation is performed on real-world datasets
to demonstrate the efficacy and efficiency of our technique
0 Replies
Loading