Distributional Privacy for Data SharingDownload PDF

03 Oct 2022 (modified: 05 May 2023)Neurips 2022 SyntheticData4MLReaders: Everyone
Keywords: privacy, distributional privacy, data release, data sharing, synthetic data
TL;DR: A new privacy framework for handling distributional privacy concerns in data sharing scenarios.
Abstract: Data sharing between different parties has become an important engine powering modern research and development processes. An important class of privacy concerns in data sharing regards the underlying distribution of data. For example, the total traffic volume of data from a networking company reveals the scale of its business. Unfortunately, existing privacy frameworks do not adequately address this class of concerns. In this paper, we propose distributional privacy, a framework for analyzing and protecting these distributional privacy concerns in data sharing scenarios. Distributional privacy is applicable in multiple data sharing settings, including synthetic data release. Theoretically, we analyze the lower and upper bounds of privacy-distortion trade-offs. Practically, we propose data release mechanism for protecting distributional privacy concerns, and demonstrate that they achieve better privacy-distortion trade-offs than alternative privacy mechanisms on real-world datasets.
