Analyzing and Defending against Membership Inference Attacks in Natural Language Processing ClassificationDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 04 Feb 2024IEEE Big Data 2022Readers: Everyone
Abstract: The risk posed by Membership Inference Attack (MIA) to deep learning models for Computer Vision (CV) tasks is well known, but MIA has not been addressed or explored fully in the Natural Language Processing (NLP) domain. In this work, we analyze the security risk posed by MIA to NLP models. We show that NLP models are at great risk to MIA, in some cases even more so than models trained on Computer Vision (CV) datasets. This includes an 8.04% increase in attack success rate on average for NLP models (as compared to CV models and datasets). We determine that there are some unique issues in NLP classification tasks in terms of model overfitting, model complexity, and data diversity that make the privacy leakage severe and very different from CV classification tasks. Based on these findings, we propose a novel defense algorithm - Gap score Regularization Integrated Pruning (GRIP), which can protect NLP models against MIA and achieve competitive testing accuracy. Our experimental results show that GRIP can decrease the MIA success rate by as much as 31.25% when compared to the undefended model. In addition, when compared to differential privacy, GRIP offers 7.81% more robustness to MIA and 13.24% higher testing accuracy. Overall our experimental results span four NLP and two CV datasets, and are tested with a total of five different model architectures.
0 Replies

Loading