Confident Data-free Model Stealing for Black-box Adversarial Attacks

Chi Hong; Jiyue Huang; Lydia Y. Chen

Confident Data-free Model Stealing for Black-box Adversarial Attacks

Chi Hong, Jiyue Huang, Lydia Y. Chen

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: machine learning, black-box attacks, data-free model stealing

Abstract: Deep machine learning models are increasingly deployed in the wild, subject to adversarial attacks. White-box model attacks assume to have full knowledge of the deployed target models, whereas the black-box models need to infer the target model via curated labeled dataset or sending abundant data queries and launch attacks. The challenge of black-box lies in how to acquire data for querying target models and effectively learn the substitute model using a minimum number of query data, which can be real or synthetic. In this paper, we propose an effective and confident data-free black-box attack, CODFE, which steals target model by queries of synthetically generated data. The core of our attack is a model stealing optimization consisting of two collaborating models (i) substitute model which imitates the target model and (ii) generator which generates most representative data to maximize the confidence of substitute model. We propose a novel training procedure that steers the synthesizing direction based on the confidence of substitute model and exploit a given set of synthetically generated images by multiple training iterations. We show the theoretical convergence of the proposed model stealing optimization and empirically evaluate its success rate on three datasets. Our results show that the accuracy of substitute models and attack success rate can be up to 56% and 34% higher than the state of the art data-free black-box attacks.

One-sentence Summary: Data-free model stealing and black-box attacks

Supplementary Material: zip

5 Replies

Loading