Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: differential privacy, distributed learning, privacy-preserving machine learning, privacy, federated learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a differentially private, non-interactive, distributed learning algorithm that scales gracefully with the number of users; it uses blind averaging of either SVMs or linear Softmax-Layers.
Abstract: Differentially private massively distributed learning poses one key challenge when compared to differentially private centralized learning, where all data are aggregated at one party: minimizing communication overhead while achieving strong utility-privacy tradeoffs. The minimal amount of communication for distributed learning is non-interactive communication, i.e., each party only sends one message.
In this work, we propose two differentially private, non-interactive, distributed learning algorithms in a framework called
Secure Distributed \helmet. This framework is based on what we coin blind averaging: each party locally learns and noises a model and all parties then jointly compute the mean of their models via a secure summation protocol (e.g., secure multiparty computation). The learning algorithms we consider for blind averaging are empirical risk minimizers (ERM) like SVMs and Softmax-activated single-layer perception (Softmax-SLP). We show that blind averaging preserves privacy if the models are averaged via secure summation and the objective function is smooth, Lipschitz, and strongly convex. We show that the objective function of Softmax-SLP fulfills these criteria, which implies leave-one-out robustness and might be of independent interest.
On the practical side, we provide experimental evidence that blind averaging for SVMs and Softmax-SLP can have a strong utility-privacy tradeoff: we reach an accuracy of $86$ \% on CIFAR-10 for $\varepsilon = 0.36$ and $1{,}000$ users and of $44$ \% on CIFAR-100 for $\varepsilon = 1.18$ and $100$ users, both after a SimCLR-based pre-training. As an ablation, we study the resilience of our approach to a strongly non-IID setting.
On the theoretical side, we show that in the limit blind averaging hinge-loss based SVMs convergences to the centralized learned SVM.
Our approach is based on the representer theorem and can be seen as a blueprint for finding convergence for other ERM problems like Softmax-SLP.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5787
Loading