TL;DR: We investigate the variance penalty incurred when estimating causal effects from differentially private data and introduce a novel mechanism, Cluster-DP, that leverages cluster information to reduce this gap.
Abstract: Estimating causal effects from randomized experiments is only possible if participants are willing to disclose their potentially sensitive responses. Differential privacy, a widely used framework for ensuring an algorithm’s privacy guarantees, can encourage participants to share their responses without the risk of de-anonymization. However, many mechanisms achieve differential privacy by adding noise to the original dataset, which reduces the precision of causal effect estimation. This introduces a fundamental trade-off between privacy and variance when performing causal analyses on differentially private data.
In this work, we propose a new differentially private mechanism, \textsc{Cluster-DP}, which leverages a given cluster structure in the data to improve the privacy-variance trade-off. While our results apply to
any clustering, we demonstrate that selecting higher-quality clusters—according to a quality metric we introduce—can decrease the variance penalty without compromising privacy guarantees. Finally, we evaluate the theoretical and empirical performance of our \textsc{Cluster-DP} algorithm on both real and simulated data, comparing it to common baselines, including two special cases of our algorithm: its unclustered version and a uniform-prior version.
Lay Summary: Our research addresses a significant challenge in contemporary online services: the necessity to experimentally assess user responses to new features or interventions while safeguarding the confidentiality of potentially sensitive outcome data. To resolve this issue, we introduce a novel framework for estimating causal effects using differentially private outcomes. In particular, in scenarios where users or items can be naturally clustered according to non-private attributes—such as product categories or demographic characteristics—our algorithm applies carefully calibrated statistical noise. For each individual, rather than using their true outcome, the method randomly substitutes it with an outcome sampled from the privatized distribution specific to their cluster. The statistical estimator we propose is shown to more accurately determine the causal effect of an intervention compared to alternative methods that do not leverage such structural information, while maintaining rigorous individual privacy standards.
Primary Area: General Machine Learning->Causality
Keywords: causal inference, differential privacy, clustering
Submission Number: 8178
Loading