Simple Baselines Are Strong Performers for Differentially Private Natural Language ProcessingDownload PDF

17 Sept 2021, 04:26 (modified: 17 Sept 2021, 09:06)PRIML 2021 OralReaders: Everyone
Keywords: differential privacy, language models, fine-tuning, NLP
TL;DR: We show that with the right setup, fine-tuning language models with DP-Adam has strong performance on datasets of modest sizes.
Abstract: Differentially private learning has seen limited success for deep learning models of text, resulting in a perception that differential privacy may be incompatible with the language model fine-tuning paradigm. We demonstrate that this perception is inaccurate and that with the right setup, high performing private models can be learned on moderately-sized corpora by directly fine-tuning with differentially private optimization. Our work highlights the important role of hyperparameters, task formulations, and pretrained models. Our analyses also show that the low performance of naive differentially private baselines in prior work is attributable to suboptimal choices in these factors. Empirical results reveal that differentially private optimization does not suffer from dimension-dependent performance degradation with pretrained models and achieves performance on-par with state-of-the-art private training procedures and strong non-private baselines.
Paper Under Submission: The paper is NOT under submission at NeurIPS
1 Reply