Keywords: Stochastic Differential Equations, Differential Privacy
TL;DR: With SDEs, we show that while DP-SignSGD is better under tight privacy or noisy batches, DP-SGD is better otherwise, and adaptivity needs far less hyperparameter tuning across privacy levels.
Abstract: Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with *adaptivity* in optimization through the lens of *stochastic differential equations*, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a privacy-utility trade-off $O(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed *linear* in $\varepsilon$ with a $O(1/\varepsilon)$ trade-off, dominating in high-privacy or high-noise regimes. Under optimal learning rates, both methods reach comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and extend from DP-SignSGD to DP-Adam.
Primary Area: optimization
Submission Number: 24978
Loading