Keywords: differential privacy, scalable distributed learning, privacy-preserving machine learning, privacy, federated learning, non-interactivity
TL;DR: Blind model averaging is, in contrast to gradient averaging, non-interactive, maintains a competitive utility-privacy tradeoff and converges for high-regularized SVMs; we also present the first output sensitivity result for Softmax Regression
Abstract: Scalable distributed differentially private learning would benefit notably from reduced communication and synchronization overhead. The current best methods, based on gradient averaging, inherently require many synchronization rounds. In this work, we analyze blind model averaging for convex and smooth empirical risk minimization (ERM): each user first locally finishes training a model and then submits the model for secure averaging without any client-side online synchronization. This setting lends itself not only to data point-level privacy but also to flexible user-level privacy, where the combined impact of the user’s trained model does not depend on the number of data points used for training.
In detail, we analyze the utility side of blind model averaging for support vector machines (SVMs) and the inherently multi-class Softmax regression (SoftmaxReg). On the theory side, we use strong duality to show for SVMs that blind model averaging converges toward centralized training performance if the task is robust against L2-regularization, i.e. if increasing the regularization weight does not destroy utility. Furthermore, we provide theoretical and experimental evidence that blind averaged Softmax Regression works well: we prove strong convexity of the dual problem by proving smoothness of the primal problem. Using this result, we also conclude the first output perturbation bounds for Softmax regression. On the experimental side, we support our theoretical SVM convergence. Furthermore, we observe hints of an even more fine-granular connection between good utility of model averaging and mid-range regularization weights which lead to compelling utility-privacy-tradeoffs for SVM and Softmax regression on 3 datasets (CIFAR-10, CIFAR-100, and federated EMNIST embeddings). We additionally provide ablation for an artificially extreme non-IID scenario.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9953
Loading