On Differentially Private U Statistics

Kamalika Chaudhuri; Po-Ling Loh; Shourya Pandey; Purnamrita Sarkar

On Differentially Private U Statistics

Kamalika Chaudhuri, Po-Ling Loh, Shourya Pandey, Purnamrita Sarkar

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Differential Privacy, Statistics, Mean Estimation

TL;DR: We devise efficient algorithms for differentially private U-statistics in the Central DP model, achieving nearly optimal error rates in various settings. Previously, this was studied in the local DP model and with U-statistics of degree 2.

Abstract: We consider the problem of privately estimating a parameter $\mathbb{E}[h(X_1,\dots,X_k)]$, where $X_1$, $X_2$, $\dots$, $X_k$ are i.i.d. data from some distribution and $h$ is a permutation-invariant function. Without privacy constraints, the standard estimators for this task are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and are the unique minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied in a black-box manner to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even $\Theta(1/n)$ rather than $O(1/n^2)$ in degenerate settings. To remedy this, we propose a new thresholding-based approach that reweights different subsets of the data using _local Hájek projections_. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.

Primary Area: Privacy

Submission Number: 14763

Loading