Efficient Gradient Clipping Methods in DP-SGD for Convolution Models

Efficient Gradient Clipping Methods in DP-SGD for Convolution Models

ICLR 2026 Conference Submission21776 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Differential Privacy, SGD, Clipping, CNNs, FFT, DP-SGD, Computational Complexity

TL;DR: We provide a computationally and memory efficient algorithm for gradient norm computation in CNNs that are used in DP-SGD.

Abstract: Differentially private stochastic gradient descent (DP-SGD) is a well-known method for training machine learning models with a specified level of privacy. However, its basic implementation is generally bottlenecked by the computation of the gradient norm (gradient clipping) for each example in an input batch. While various techniques have been developed to mitigate this issue, there are only a handful of methods pertaining to convolution models, e.g., vision models. In this work, we present three practical methods for performing gradient clipping that improve upon previous state-of-art methods. Two of these methods use in-place operations to reduce memory overhead, while the third one leverages a relationship between Fourier transforms and convolution layers. We then develop a dynamic algorithm that dispatches one of the above three algorithms to optimize performance. Extensive benchmarks confirm that this algorithm consistently outperforms other state-of-the-art algorithms and frameworks.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 21776

Loading