Distributed Fine-tuning of Language Models on Private Data


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, the language of users (e.g. in private messaging) could change in a year and be completely different from what we observe in publicly available data. At the same time, public data can be used for obtaining general knowledge (i.e. general model of English). We study approaches to distributed fine-tuning of a general model on user private data with the additional requirement of maintaining the quality on the general data. Our experiments demonstrate that a technique based on model averaging and random rehearsal outperforms an approach based on transfer learning, and show that the proposed method improves prediction quality in a reasonable time. The procedure leads to an 8.7 percentage point improvement in keystroke saving rate on informal English texts compared to a basic model trained on Wikipedia. We also propose an experimental framework for evaluating differential privacy of distributed training of language models and show that our approach has good privacy guarantees.
  • TL;DR: We propose a method of distributed fine-tuning of language models on user devices without collection of private data
  • Keywords: distributed training, federated learning, language modeling, differential privacy