Randomized Response Has No Disparate Impact on Model Accuracy

Published: 01 Jan 2023, Last Modified: 23 Feb 2025IEEE Big Data 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Differential privacy, the current gold standard for data anonymization and protection, is commonly known to cause degraded utility, and exacerbate unfairness, for different demographic groups when it is used to train a private machine learning model. However, in contrast with this long-held perception, recent work has shown that local differential privacy, a variant of differential privacy where users perturb their data on their device before it is aggregated, can surprisingly lead to improved fairness measures without significantly affecting the utility of the underlying machine learning model. Motivated by this previous work, in this paper we further show that applying randomized response, a popular local differential privacy method, does not incur disparate impact on the private model’s accuracy for different demographic groups. Specifically, through conducting thorough empirical analysis in which we perform randomized response on the labels, the features, or on both the features and labels across multiple data modalities and model architectures, we empirically show that the absolute difference in utility loss for different demographic groups is negligible.
Loading