Sparsity-Preserving Differentially Private Training of Large Embedding Models

Badih Ghazi; Yangsibo Huang; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Amer Sinha; Chiyuan Zhang

Sparsity-Preserving Differentially Private Training of Large Embedding Models

Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Differential Privacy, Recommendation Systems, Embedding Models, Efficient Machine Learning

TL;DR: We present two novel approaches to preserve gradient sparsity in DP-trained large embedding models while maintaining their utility.

Abstract: As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during the private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.

Supplementary Material: zip

Submission Number: 5266

Loading