Keywords: fairness, distribution shift, minimax estimation, linear regression, finite sample analysis
TL;DR: We present how finite sample estimation and group heterogeneity distort the fairness-accuracy frontier, providing minimax-optimal estimators and optimal sampling strategies that make fairness-accuracy tradeoffs mreliable under limited data.
Abstract: Machine learning models must balance accuracy and fairness, but these goals often conflict, particularly when data come from multiple demographic groups with heterogeneous distributions. A useful tool for understanding this trade-off is the fairness-accuracy (FA) frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy. Prior analyses of the FA frontier provide a full characterization under the assumption of complete knowledge of population distributions---an unrealistic ideal. We study the FA frontier in the finite-sample regime, showing how sampling error and the heterogeneity in group distributions distort the empirical frontier from its population counterpart. In particular, we derive minimax-optimal estimators that depend on the designer's knowledge of the covariate distribution. For each estimator, we characterize how finite-sample effects asymmetrically impact each group's risk, and identify optimal sample allocation strategies. Our results transform the FA frontier from a theoretical construct into a practical tool for policymakers and practitioners who must often design algorithms with limited data.
Submission Number: 108
Loading