TL;DR: We propose model personalization algorithms under user-level differential privacy via shared representation learning.
Abstract: We study model personalization under user-level differential privacy (DP) in the shared representation framework. In this problem, there are $n$ users whose data is statistically heterogeneous, and their optimal parameters share an unknown embedding $U^* \in\mathbb{R}^{d\times k}$ that maps the user parameters in $\mathbb{R}^d$ to low-dimensional representations in $\mathbb{R}^k$, where $k\ll d$. Our goal is to privately recover the shared embedding and the local low-dimensional representations with small excess risk in the federated setting. We propose a private, efficient federated learning algorithm to learn the shared embedding based on the FedRep algorithm in (Collins et al.,
2021). Unlike (Collins et al., 2021), our algorithm satisfies differential privacy, and our results hold for the case of noisy labels. In contrast to prior work on private model personalization (Jain et al., 2021), our utility guarantees hold under a larger class of users' distributions (sub-Gaussian instead of Gaussian distributions). Additionally, in natural parameter regimes, we improve the privacy error term in (Jain
et al., 2021) by a factor of $\widetilde{O}(dk)$. Next, we consider the binary classification setting. We present an information-theoretic construction to privately learn the shared embedding and derive a margin-based accuracy guarantee that is independent of $d$. Our method utilizes the Johnson-Lindenstrauss transform to reduce the effective dimensions of the shared embedding and the users' data. This result shows that dimension-independent risk bounds are possible in this setting under a margin loss.
Lay Summary: We develop new methods for training personalized machine learning models in a way that rigorously protects users’ privacy. In many real-world applications, users have different kinds of data—so personalized models tailored to each user’s data are more accurate than a single shared model. However, training such models while safeguarding sensitive user information remains a major challenge.
Our work tackles this challenge by proposing private algorithms that allow users to collaboratively learn a shared low-dimensional representation of their data without revealing individual information. Compared to earlier approaches, our method works under more realistic assumptions about user data and offers improved theoretical guarantees on both privacy and accuracy.
We also extend our results to binary classification tasks, where we show that privacy-preserving personalization can be achieved with guarantees that do not depend on the data’s complexity (i.e., its dimensionality), thanks to dimensionality reduction techniques.
Primary Area: Social Aspects->Privacy
Keywords: Differential Privacy, Model Personalization, Federated/Multi-task Learning, Representation Learning, Statistical Heterogeneity
Submission Number: 11924
Loading