⁠When Data Can't Meet: Estimating Correlation Across Privacy Barriers

Published: 18 Sept 2025, Last Modified: 06 Jan 2026NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: differential privacy; distributed data; minimax optimality
TL;DR: We study estimation correlation from bivariate data separated vertically, i.e., component-wise into two servers with privacy constraints.
Abstract: We consider the problem of estimating the correlation of two random variables $X$ and $Y$, where the pairs $(X,Y)$ are not observed together, but are instead separated co-ordinate-wise at two servers: server 1 contains all the $X$ observations, and server 2 contains the corresponding $Y$ observations. In this vertically distributed setting, we assume that each server has its own privacy constraints, owing to which they can only share suitably privatized statistics of their own component observations. We consider differing privacy budgets $(\varepsilon_1,\delta_1)$ and $(\varepsilon_2,\delta_2)$ for the two servers and determine the minimax optimal rates for correlation estimation allowing for both non-interactive and interactive mechanisms. We also provide correlation estimators that achieve these rates and further develop inference procedures, namely, confidence intervals, for the estimated correlations. Our results are characterized by an interesting rate in terms of the sample size $n$, $\varepsilon_1$, $\varepsilon_2$, which is strictly slower than the usual central privacy estimation rates. More interestingly, we find that the interactive mechanism is always better than its non-interactive counterpart whenever the two privacy budgets are different. Results from extensive numerical experiments support our theoretical findings.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 24381
Loading