Privacy Budget Tailoring in Private Data Analysis

Published: 22 Dec 2023, Last Modified: 22 Dec 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: We consider the problem of learning differentially private linear and logistic regression models that do not exhibit disparate performance for minority groups in the data. Small-sized datasets pose a challenging regime for differential privacy; that is, satisfying differential privacy while learning models from data can lead to models with worse accuracy for minority---in size---subgroups. To address this challenge, inspired by Abowd & Schmutte (2018), we propose: (i) to systematically tailor the privacy budget to the different groups, (ii) use linear optimization oracles in a grid to optimize Lagrangian objectives that correspond to fair learning and optimization. We present efficient differentially private algorithms for linear and logistic regression subject to fairness constraints (e.g., bounded group loss) that allocate the privacy budget based on the private standard error of each subgroup in the data. Consequently, the formulation reduces the amount of noise added to these groups, which leads to more accurate models for such groups. We validate the proposed, group-aware budget allocation, method on synthetic and real-world datasets where we show significant reductions in prediction error for the smallest groups, while still preserving sufficient privacy to protect the minority group from re-identification attacks. In addition, we provide sample complexity lower bounds for our problem formulation.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1) **Additional Clarifications**: Explained the use of "sensitive" attributes and "non-sensitivie" attributes. Clarified our use of basic composition and using privacy budgets to sample from $\mathbb{R}^{d\times d}$. Clarified the use of the OLS (Ordinary Least Squares) model and the generation process. We have transformed Sec. 6.1 for clarity sake (it is no longer its own individual subsection). We have also grouped all experimental results about group sizes and number of groups into its own paragraph. Added additional comparison to the work of Abowd & Schmutte (2018). Further clarified how Lemma 3.1 motivates the paragraph "Why allocate privacy budget based on standard errors?". Clarified the importance of Section 5, and how it connects to the rest of the paper. Moved the "Helper Lemmas" to the appendix. Moved some empirical evidence from the appendix back to the main paper. Provided promised clarifications from the post-rebuttal discussion. 2) **Experimental Details**: We moved experiments on larger groups from the appendix to the main paper. The tables moved (Tables 1,2,3) illustrate that as the privacy parameter increases, the MSPE, as expected, generally decreases for all groups. We have now included some more results and accompanying discussion that show different privacy budget splits as a result of the standard errors for the subgroups. Added additional experiments to illustrate why looking at the standard errors could provide information that is impossible to obtain by just looking at the data set sizes. 3) **Additional Complexity Analysis**: Added sample complexity lower bounds that essentially illustrate how large datasets need to be for a certain accuracy guarantee to be met. Also clarified time complexity of the algorithms.
Supplementary Material: zip
Assigned Action Editor: ~Joonas_Jälkö1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1578