- Keywords: Privacy, Natural language processing, Classification models, Membership inference attacks
- TL;DR: Investigating sample-level and user-level membership inference attacks against NLP classification models.
- Abstract: The success of natural language processing (NLP) is making NLP applications commonplace. Unfortunately, recent research has shown that privacy might be at stake given that these models are often trained on private user data. While privacy risks are demonstrated in text generation settings, privacy risks of the text classification settings, which subsume myriad downstream applications, are largely unexplored. In this work, we study the susceptibility of NLP classification models, used for text classification tasks, to membership inference (MI), which is a fundamental type of privacy leakage. We design a comprehensive suite of attacks to assess the risk of sample-level MI, as well as that of relatively unexplored user-level MI. We introduce novel user-level MI attacks that outperform the existing attacks and conduct experiments on Transformer-based and RNN-based NLP models. Our evaluations show that user-level MI is significantly stronger than sample-level MI. We further perform in-depth analyses showing the effect of various NLP-specific parameters on MI against NLP classification models.
- Paper Under Submission: The paper is NOT under submission at NeurIPS