Population Level Privacy Leakage in Binary Classification wtih Label Noise

Robert Istvan Busa-Fekete; Andres Munoz medina; Umar Syed; Sergei Vassilvitskii

Population Level Privacy Leakage in Binary Classification wtih Label Noise

Robert Istvan Busa-Fekete, Andres Munoz medina, Umar Syed, Sergei Vassilvitskii

Published: 04 Nov 2021, Last Modified: 15 May 2023PRIML 2021 OralReaders: Everyone

Keywords: binary classification, local differential privacy, label noise

Abstract: We study the privacy limitations of label differential privacy. Label differential privacy has emerged as an intermediate trust model between local and central differential privacy, where only the label of each training example is protected (and the features are assumed to be public). We show that the guarantees provided by label DP are significantly weaker than they appear, as an adversary can "un-noise" the perturbed labels. Formally we show that the privacy loss has a close connection with Jeffreys' divergence of the conditional distribution between positive and negative labels, which allows explicit formulation of the trade-off between utility and privacy in this setting. Our results suggest how to select public features that optimize this trade-off. But we still show that there is no free lunch --- instances where label differential privacy guarantees are strong are exactly those where a good classifier does not exist. We complement the negative results with a non-parametric estimator for the true privacy loss, and apply our techniques on large-scale benchmark data to demonstrate how to achieve a desired privacy protection.

Paper Under Submission: The paper is currently under submission at NeurIPS

1 Reply

Loading