Why Differentially Private Local SGD -- An Analysis of Synchronized-Only Biased Iterate

24 Apr 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX
Keywords: Local SGD, Differential Privacy, Clipping
Abstract: We argue to use Differentially-Private Local Stochastic Gradient Descent (DP-LSGD) in both centralized and distributed setups, and explain why DP-LSGD enjoys higher clipping efficiency and produces less clipping bias compared to classic Differentially-Private Stochastic Gradient Descent (DP-SGD). For both convex and non-convex optimization, we present generic analysis on noisy synchronized-only iterates in LSGD, the building block of federated learning, and study its applications to differentially-private gradient methods with clipping-based sensitivity control. We point out that given the current {\em decompose-then-compose} framework, there is no essential gap between the privacy analysis of centralized and distributed learning, and DP-SGD is a special case of DP-LSGD. We thus build a unified framework to characterize the clipping bias via the second moment of local updates, which initiates a direction to systematically instruct DP optimization by variance reduction. We show DP-LSGD with multiple local iterations can produce more concentrated local updates and then enables a more efficient exploitation of the clipping budget with a better utility-privacy tradeoff. In addition, we prove that DP-LSGD can converge faster to a small neighborhood of global/local optimum compared to regular DP-SGD. Thorough experiments on practical deep learning tasks are provided to support our developed theory.
Supplementary Material: pdf
Submission Number: 574
Loading