Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing

Published: 05 Mar 2024, Last Modified: 04 May 2024PMLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: privacy, data release, data sharing, synthetic data
TL;DR: A new privacy framework to address multi-secret distributional privacy concerns in data sharing.
Abstract: Data sharing enables critical advances in many research areas and business applications, but it may lead to inadvertent disclosure of sensitive summary statistics (e.g., mean, standard deviation). Existing efforts mainly focus on protecting a single confidential quantity, while in practice, data frequently involves a range of sensitive quantities. We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing. Specifically, we measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets. Within diverse data sharing paradigms, given an attacker’s objective spanning from inferring a subset to the entirety of summary statistic secrets, we systematically design and analyze tailored privacy metrics. Defining the distortion as the worst-case distance between the original and released data distribution, we analyze the tradeoff between privacy and distortion.
Submission Number: 5
Loading