28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: BERT-based models, vulnerabilities, attribute inference, transferability
Abstract: Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by pretrained BERT models. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models. These BERT-based APIs are often designed to not only provide reliable service but also protect intellectual properties or privacy-sensitive information of the training data. However, a series of privacy and robustness issues may still exist when a fine-tuned BERT model is deployed as a service. In this work, we first present an effective model extraction attack, where the adversary can practically steal a BERT-based API (the target/victim model). We then demonstrate: (1) how the extracted model can be further exploited to develop effective attribute inference attack to expose sensitive information of the training data of the victim model; (2) how the extracted model can lead to highly transferable adversarial attacks against the victim model. Extensive experiments on multiple benchmark datasets under various realistic settings validate the potential privacy and adversarial vulnerabilities of BERT-based APIs.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We demonstrate that the extracted model can be used to enhance the sensitive attribute inference and adversarial transferability.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=m5c63dz4AF
15 Replies
