Towards Efficient and Scalable Implementation of Differentially Private Deep Learning

23 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We study the computational efficiency of different implementations of differentially private stochastic gradient descent algorithm when implemented using proper Poisson subsampling.
Abstract: Differentially private stochastic gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP). The most common DP-SGD privacy accountants rely on Poisson subsampling to ensure the theoretical DP guarantees. Implementing computationally efficient DP-SGD with Poisson subsampling is not trivial, which leads to many implementations that ignore this requirement. We quantify the computational cost of training deep learning models under differential privacy by benchmarking efficient methods with the correct Poisson subsampling requirement. We find that using the naive implementation DP-SGD with Opacus in PyTorch has a throughput between 2.6 and 8 times lower than that of SGD. However, efficient gradient clipping implementations like Ghost Clipping can roughly halve this cost. We propose alternative computationally efficient ways of implementing DP-SGD with JAX that use Poisson subsampling and performs comparably with efficient clipping optimizations based on PyTorch. We highlight important implementation considerations with JAX. Finally, we study the scaling behavior using up to 80 GPUs and find that DP-SGD scales better than SGD.
Primary Area: Social Aspects->Privacy
Keywords: differential privacy, gradient based optimization, computational efficiency, distributed computing
Submission Number: 9343
Loading