Keywords: Label Differential Privacy, Regression, Response Privacy, RPWithPrior
Abstract: With the wide application of machine learning techniques in practice, privacy preservation has gained increasing attention. Protecting user privacy with minimal accuracy loss is a fundamental task in the data analysis and mining community. In this paper, we focus on regression tasks under $\epsilon$-label differential privacy guarantees. Some existing methods for regression with $\epsilon$-label differential privacy, such as the RR-On-Bins mechanism and its variant, discretized the output space into finite bins and then applied randomized response (RR) algorithms. To efficiently determine these finite bins, the authors rounded the original responses down to integer values. However, such operations does not align well with real-world scenarios. To overcome these limitations, we model both original and randomized responses as {\it continuous} random variables, avoiding discretization entirely. Our novel approach estimates an optimal interval for randomized responses and introduces new algorithms designed for scenarios where a prior is either known or unknown. Additionally, we prove that our algorithm, RPWithPrior, guarantees $\epsilon$-label differential privacy. Numerical results demonstrate that our approach gets better performance compared with the Gaussian, Laplace, Staircase, and RRonBins, Unbiased mechanisms on the Communities and Crime, Criteo Sponsored Search Conversion Log, California Housing datasets and some simulated datasets.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 10339
Loading