Mean Estimation with User-level Privacy under Data HeterogeneityDownload PDF

Published: 04 Nov 2021, Last Modified: 15 May 2023PRIML 2021 PosterReaders: Everyone
Keywords: differential privacy, mean estimation, heterogeneous data
TL;DR: We provide a framework for privately estimating the population mean from heterogeneous user data, and show that our estimator is near-optimal.
Abstract: A key challenge for data analysis in the federated setting is that user data is heterogeneous, i.e., it cannot be assumed to be sampled from the same distribution. Further, in practice, different users may possess vastly different number of samples. In this work we propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate near asymptotic optimality of our estimator among nearly unbiased estimators. In particular, while the optimal non-private estimator can be shown to be linear, we show that privacy constrains us to use a non-linear estimator.
Paper Under Submission: The paper is NOT under submission at NeurIPS
1 Reply