On the Convergence and Calibration of Deep Learning with Differential PrivacyDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: deep learning, differential privacy, calibration, convergence, neural tangent kernel
TL;DR: We show that differentially private deep learning can be severely mis-calibrated due to the gradient clipping, which can be alleviated by a new clipping method.
Abstract: Differentially private (DP) neural network achieves the privacy usually at the cost of slower convergence (and thus lower performance) than its non-private counterpart. To analyze the difficulty of DP training, this work gives the first convergence analysis through the lens of training dynamics and the neural tangent kernel (NTK). We successfully characterize the effects of two key components in the DP training: the per-sample gradient clipping (flat or layerwise) and the noise addition. Our analysis not only initiates a general principled framework to understand the DP deep learning with any network architecture and loss function, but also motivates a new clipping method -- the \textit{global clipping}, that significantly improves the convergence, as well as preserves the same DP guarantee and computational efficiency as the existing method, which we term as \textit{local clipping}. Theoretically speaking, we precisely characterize the effect of per-sample clipping on the NTK matrix and show that the noise scale of DP optimizers does not affect the convergence in the \textit{gradient flow} regime. In particular, we shed light on several behaviors that are only guaranteed by our global clipping. For example, the global clipping can preserve the positive semi-definiteness of NTK, which is almost certainly broken by the local clipping; DP gradient descent (GD) with global clipping converges monotonically to zero loss, while the convergence of local clipping can be non-monotone; the global clipping is surprisingly effective at learning \textit{calibrated classifiers}, whereas existing DP classifiers are oftentimes over-confident and unreliable. Notably, our analysis framework easily extends to other optimizers, e.g., DP-Adam. We demonstrate through numerous experiments that DP optimizers equipped with global clipping perform strongly. Implementation-wise, the global clipping can be realized by inserting only one line of code into the Pytorch \texttt{Opacus} library.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
Supplementary Material: zip
12 Replies

Loading