On the Pitfalls of Label Differential Privacy

Andres Munoz medina; Robert Istvan Busa-Fekete; Umar Syed; Sergei Vassilvitskii

On the Pitfalls of Label Differential Privacy

Andres Munoz medina, Robert Istvan Busa-Fekete, Umar Syed, Sergei Vassilvitskii

Published: 22 Oct 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop LatinX in AI OralReaders: Everyone

Keywords: Differential privacy, distribution learning

TL;DR: Present a framework to analyze the true privacy leakage of releasing data with label differential privacy

Abstract: We study the privacy limitations of label differential privacy, which has emerged as an intermediate trust model between local and central differential privacy, where only the label of each training example is protected (and the features are assumed to be public). We show that the guarantees provided by label DP are significantly weaker than they appear, as an adversary can "un-noise" the perturbed labels. Formally we show that the privacy loss has a close connection with Jeffreys' divergence of the conditional distribution between positive and negative labels, which allows explicit formulation of the trade-off between utility and privacy in this setting. Our results suggest how to select public features that optimize this trade-off. But we still show that there is no free lunch---instances where label differential privacy guarantees are strong are exactly those where a good classifier does not exist. We complement the negative results with a non-parametric estimator for the true privacy loss, and apply our techniques on large-scale benchmark data to demonstrate how to achieve a desired privacy protection.

1 Reply

Loading