Information-theoretic Quantification of Inherent Discrimination Bias in Training Data for Supervised Learning
Keywords: Algorithmic fairness, Model-agnostic discrimination bias quantification, Fair data engineering, Information-theoretic measures, Partial information decomposition, Shapley value function
TL;DR: We propose a novel framework, using information theoretic measure and game theoretic aggregation, to quantify the marginal impact of dataset features on the discrimination bias of downstream learning models without access to their predictions.
Abstract: Algorithmic fairness research has mainly focused on adapting learning models to mitigate discrimination based on protected attributes, yet understanding inherent biases in training data remains largely unexplored. Quantifying these biases is crucial for informed data engineering, as data mining and model development often occur separately. We address this by developing an information-theoretic framework to quantify the marginal impacts of dataset features on the discrimination bias of downstream predictors. We postulate a set of desired properties for candidate discrimination measures and derive measures that (partially) satisfy them. Distinct sets of these properties align with distinct fairness criteria like demographic parity or equalized odds, which we show can be in disagreement and not simultaneously satisfied by a single measure. We use the Shapley value to determine individual features' contributions to overall discrimination, and prove its effectiveness in eliminating redundancy. We validate our measures through a comprehensive empirical study on numerous real-world and synthetic datasets. For synthetic data, we use a parametric linear structural causal model to generate diverse data correlation structures. Our analysis provides empirically validated guidelines for selecting discrimination measures based on data conditions and fairness criteria, establishing a robust framework for quantifying inherent discrimination bias in data.
Submission Number: 71
Loading