On the Convergence of DP-SGD with Adaptive Clipping

Published: 10 Oct 2024, Last Modified: 07 Dec 2024NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: gradient clipping, differentially private optimization
Abstract: Stochastic Gradient Descent (SGD) with gradient clipping has emerged as a powerful technique for stabilizing neural network training and enabling differentially private optimization. While constant clipping has been extensively studied, adaptive methods like quantile clipping have shown empirical success without thorough theoretical understanding. This paper provides the first comprehensive convergence analysis of SGD with gradient quantile clipping (QC-SGD). We demonstrate that QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but show this can be mitigated through a carefully designed quantile and step size schedule. Furthermore, the analysis is extended to the differentially private case. We establish theoretical foundations for this widely-used heuristic and identify open problems to guide future research.
Submission Number: 48
Loading