Public Data-Assisted Mirror Descent for Private Model TrainingDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Differential Privacy, Public Data, Mirror Descent
Abstract: In this paper, we revisit the problem of effectively using public data to improve the privacy/utility trade-offs for differentially private (DP) model training. Here, public data refers to auxiliary data sets that have no privacy concerns. We consider public training data sets that are from the *same distribution* as the private training data set. For convex losses, we show that a variant of Mirror Descent provides population risk guarantees which are independent of the dimension of the model ($p$). Specifically, we apply Mirror Descent with the loss generated by the public data as the *mirror map*, and using DP gradients of the loss generated by the private (sensitive) data. To obtain dimension independence, we require $G_Q^2 \leq p$ public data samples, where $G_Q$ is the Gaussian width of the smallest convex set $Q$ such that the public loss functions are 1-strongly convex with respect to $\|\cdot\|_Q$. Our method is also applicable to non-convex losses, as it does not rely on convexity assumptions to ensure DP guarantees. We further show that our algorithm has a natural "noise stability" property: If in a bounded region around the current iterate, the public loss satisfies $\alpha_v$-strong convexity in a direction $v$, then using noisy gradients instead of the exact gradients shifts our next iterate in the direction $v$ by an amount proportional to $1/\alpha_v$ (in contrast with DP stochastic gradient descent (DP-SGD)), where the shift is isotropic). Analogous results in prior works had to explicitly learn the geometry using the public data in the form of preconditioner matrices. We demonstrate the empirical efficacy of our algorithm by showing privacy/utility trade-offs on linear regression, and deep learning benchmark datasets (CIFAR-10, EMNIST, and WikiText-2). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but also improves over DP-SGD on models that have been pretrained with the public data to begin with.
One-sentence Summary: In this paper we show, via the theory of mirror maps, that one can effectively use the loss function generated by public data as a regularizer to control noise variance in differentially private model training.
14 Replies

Loading