Keywords: Differential Privacy, Fairness, subpopulations
TL;DR: Fairness can be at odds with privacy in long-tailed data
Abstract: As machine learning algorithms are increasingly deployed on sensitive data in critical decision making processes, it is important that they are also private and fair. When the data comprises multiple small subpopulations ,in a long-tailed distribution, we prove that private learning algorithms with high average accuracy result in high error on the minority group with high probability. We further prove that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. We then provide an extensive set of experiments that demonstrate how our theoretical results are reflected in a variety of differentially private algorithms (DP-SGD and DP-Random Forests) on synthetic, real-world vision (CIFAR-10 and CelebA), and tabular (Law school) datasets.
Supplementary Material: zip