TL;DR: We propose a privacy-aware subsampling scheme that minimizes the variance for private mean estimation.
Abstract: This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation, which amplifies the privacy guarantee; however, to have the same final privacy guarantee for each group, different nominal privacy budgets need to be used depending on the subsampling rate. Ignoring the effect of DP, traditional stratified sampling strategies risk significant variance inflation. We phrase our optimal survey design as an optimization problem, where we determine the optimal subsampling sizes for each group with the goal of minimizing the variance of the resulting estimator. We establish strong convexity of the variance objective, propose an efficient algorithm to identify the integer-optimal design, and offer insights on the structure of the optimal design.
Lay Summary: This work develops a new survey method for collecting social data that protects individual privacy while ensuring the most accurate results. By carefully choosing how many responses to gather from each group of people, the method provides an optimal estimate of the private mean.
Link To Code: https://github.com/garyUAchen/DP_OptimSurvey
Primary Area: Social Aspects->Security
Keywords: Differential Privacy, Experimental Design, Stratified sampling, Subsampling, Privacy Amplification
Submission Number: 8500
Loading