Differentially Private Optimizers Can Learn Adversarially Robust Models

Published: 21 Nov 2023, Last Modified: 21 Nov 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Machine learning models have shone in a variety of domains and attracted increasing attention from both the security and the privacy communities. One important yet worrying question is: Will training models under the differential privacy (DP) constraint have an unfavorable impact on their adversarial robustness? While previous works have postulated that privacy comes at the cost of worse robustness, we give the first theoretical analysis to show that DP models can indeed be robust and accurate, even sometimes more robust than their naturally-trained non-private counterparts. We observe three key factors that influence the privacy-robustness-accuracy tradeoff: (1) hyper-parameters for DP optimizers are critical; (2) pre-training on public data significantly mitigates the accuracy and robustness drop; (3) choice of DP optimizers makes a difference. With these factors set properly, we achieve 90\% natural accuracy, 72\% robust accuracy ($+9\%$ than the non-private model) under $l_2(0.5)$ attack, and 69\% robust accuracy ($+16\%$ than the non-private model) with pre-trained SimCLRv2 model under $l_\infty(4/255)$ attack on CIFAR10 with $\epsilon=2$. In fact, we show both theoretically and empirically that DP models are Pareto optimal on the accuracy-robustness tradeoff. Empirically, the robustness of DP models is consistently observed across various datasets and models. We believe our encouraging results are a significant step towards training models that are private as well as robust.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We add a new paragraph (the first paragraph) in Section 3.2, explaining that it is non-trivial to show the non-fixed $w$ scenario so as to extend our theorems. Specifically, we found it difficult to solve or quantitatively describe the $w$ under DP regime, because DP modifies the optimization whereas robustness only modifies the objective function. That is, evaluating DP $w$ for robustness requires studying a constrained optimization problem, which is much more difficult than the un-contrained optimization problem of non-DP robust error. We also add comparisons with non-DP & DP adversarially trained (AT) models in Table 2 and Table 3, highlighted in cyan columns, and discussion in the corresponding section. These comparisons are consistent with our previous analysis that DP models can be as robust as (or more robust than) non-DP models, in both AT or naturally trained regimes. We also add Remark 5 that across various DP optimizers (including SGD and Adam), it is appropriate to use automatic clipping to normalize instead of to clip the per-sample gradients. This allows us to remove the tuning of $R$ and achieve strong robustness and accuracy easily. We also modify the discussion section to accommodate our changes since last submission. For example, DP+AT models were listed as a future direction but now replaced, given that we already evaluated it in Table 2 and Table 3. We also polished the wording and formatting.
Code: https://github.com/woodyx218/private_vision/
Assigned Action Editor: ~Sanghyuk_Chun1
Submission Number: 1469