Few-Shot Copycat: Improving Performance of Black-Box Attack with Random Natural Images and Few Examples of Problem Domain
Abstract: Many companies have developed Convolutional Neural Networks (CNNs) models as a product to offer through APIs to solve various problems. Therefore, the protection of the Intellectual Property of these models from potential attacks is a critical concern for these entities. Regarding these threats, several studies have identified vulnerabilities in these systems, such as model extraction, where the adversary uses Problem Domain (PD) and Non-Problem Domain (NPD) data to generate an imitation of the target model (Oracle). An example of this attack is the Copycat CNN method, where the adversary uses NPD images to train a surrogate model with the Oracle’s hard-labels. The surrogate model is then fine-tuned with PD images labeled by the Oracle, significantly improving performance and reducing Oracle queries. However, PD images are generally expensive and scarce on the Internet. In this study, we introduce Few-Shot Copycat, a novel approach to improve Copycat CNN. With just a few PD images from each class of the target problem, our approach improves the performance of the method based on NPD images only. This method requires much fewer queries to copy the target model exposing even more of this threat to companies. The proposed method was evaluated in five real classification problems (Facial Expression Recognition, General Object, Street View House Number, Traffic Sign, and Fashion Image). Results showed that the Few-Shot Copycat can reduce at least \(6\times \) the number of images required for extraction (i.e., reducing the number of queries).
Loading