Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large *public* datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets *should* be viewed as differential-privacy-preserving. We further scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains. Finally, we observe that reliance on large pretrained models may lose *other* forms of privacy, requiring data to be outsourced to a more compute-powerful third party.
Submission Number: 2387
Loading