Keywords: model stealing, model extraction, defenses, encoders, sentence embedding
TL;DR: Our stealing of self-supervised encoders uses less computing power than a victim’s training and up to 40x fewer queries than the number of samples in a victim’s training dataset.
Abstract: Self-supervised learning (SSL) has become the predominant approach to training on large amounts of data when no labels are available. Since the corresponding model architectures are usually large, the training process is, in itself, costly, and training relies on dedicated expensive hardware. As a consequence, not every party can train such models from scratch. Instead, new APIs offer paid access to pre-trained SSL models. We consider transformer-based SSL sentence encoders and show that they can be efficiently extracted (stolen) from behind these APIs through black-box query access. Our stealing requires down to 40x fewer queries than the number of the victim's training data points and much less computation. This large gap between low attack costs and high victim training costs strongly incentivizes attackers to steal encoders. To protect the transformer-based sentence encoders against stealing, we propose to embed secret downstream tasks to their training which serve as watermarks. In general, our work highlights that sentence embedding encoders are easily stolen but hard to defend.
0 Replies
Loading