Keywords: Differential Privacy, low-pass filter, noise reduction, non-covex optimization
TL;DR: We proposed a noise reduction approach to improve DP training performance with Kalman filter.
Abstract: Differentially private (DP) optimizers have been widely used to train modern machine learning models while protecting the privacy of training data. A popular approach to privatize an optimizer is to clip the individual gradients and add sufficiently large noise to the clipped gradient. However, a significant performance drop is observed when these optimizers are applied to large-scale model (pre-)training. This degradation stems from the substantial noise injection required to maintain DP, which disrupts the optimizer's dynamics. This paper introduces DiSK, a novel framework designed to significantly enhance the performance of DP optimizers. DiSK employs Kalman filtering, a technique drawn from control and signal processing, to effectively denoise privatized gradients and generate progressively refined gradient estimations. To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands. We establish theoretical privacy-utility trade-off guarantees for DiSK, and demonstrate provable improvements over standard DP optimizers like DPSGD. Extensive experiments across diverse tasks, including vision tasks such as CIFAR-100 and ImageNet-1k and language fine-tuning tasks such as GLUE, E2E, and DART, validate the effectiveness of KFOpt. The results showcase its ability to significantly improve the performance of DP optimizers, surpassing state-of-the-art results under the same privacy constraints.
Submission Number: 21
Loading