Differential Privacy in Distributed Learning: Beyond Uniformly Bounded Stochastic Gradients
This paper explores locally differentially private distributed algorithms that solve non-convex empirical risk minimization problems. Traditional approaches often assume uniformly bounded stochastic gradients, which may not hold in practice. To address this issue, we propose differentially Private Stochastic recursive Momentum with grAdient clipping (PriSMA) that judiciously integrates clipping and momentum to enhance utility while guaranteeing privacy. Without assuming uniformly bounded stochastic gradients, given privacy requirement $(\epsilon,\delta)$, PriSMA achieves a learning error of $\tilde{\mathcal{O}}\big((\frac{\sqrt{d}}{\sqrt{M}N\epsilon})^\frac{2}{5}\big)$, where $M$ is the number of clients, $N$ is the number of data samples on each client and $d$ is the model dimension. This learning error bound is better than the state-of-the-art $\tilde{\mathcal{O}}\big((\frac{\sqrt{d}}{{\sqrt{M}N\epsilon}})^\frac{1}{3}\big)$ in terms of the dependence on $M$ and $N$.