Stealing and Defending Transformer-based Encoders

Adam Dziedzic; Franziska Boenisch; Mingjian Jiang; Haonan Duan; Nicolas Papernot

Stealing and Defending Transformer-based Encoders

Adam Dziedzic, Franziska Boenisch, Mingjian Jiang, Haonan Duan, Nicolas Papernot

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: model stealing, model extraction, defenses against model extraction, transformers, encoders, self-supervised learning

TL;DR: We perform attacks against transformer-based encoders and propose a new defense against extraction of vision transformers that combines watermarking with dataset inference.

Abstract: Self-supervised learning (SSL) has become the predominant approach to training on large amounts of unlabeled data. New real-world APIs offer services to generate high-dimensional representations for given inputs based on SSL encoders with transformer architectures. Recent efforts highlight that it is possible to steal high-quality SSL encoders trained on convolutional neural networks. In this work, we are the first to extend this line of work to stealing and defending transformer-based encoders in both language and vision domains. We show that it is possible to steal transformer-based sentence embedding models solely using their returned representations and with 40x fewer queries than the number of victim's training data points. We also decrease the number of required stealing queries for the vision encoders by leveraging semi-supervised learning. Finally, to defend vision transformers against stealing attacks, we propose a defense technique that combines watermarking with dataset inference. Our method creates a unique encoder signature based on a private data subset that acts as a secret seed during training. By applying dataset inference on the seed, we can then successfully identify stolen transformers.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)

Supplementary Material: zip

22 Replies

Loading