Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

Yixu Wang; Jie Li; Hong Liu; Yan Wang; YONGJIAN WU; Feiyue Huang; Rongrong Ji

Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

Yixu Wang, Jie Li, Hong Liu, Yan Wang, YONGJIAN WU, Feiyue Huang, Rongrong Ji

21 May 2021 (modified: 26 May 2025)NeurIPS 2021 SubmittedReaders: Everyone

Abstract: Previous studies have verified that the functionality of black-box models can be stolen with the full probability outputs. However, under the more practical hard-label setting, we observe that existing methods suffer from catastrophic performance degradation. We argue this is due to the lack of rich information in the probability prediction and the over-fitting caused by hard labels. To this end, we propose a novel hard-label model stealing method termed black-box dissector, which consists of two erasing-based modules. One is a CAM-driven erasing strategy that is designed to increase the information capacity hidden in hard labels from the victim model. The other is a random-erasing-based self-knowledge distillation module that utilizes soft labels from the substitute model to mitigate over-fitting. Extensive experiments on four widely-used datasets consistently demonstrate that our method outperforms state-of-the-art methods, with an improvement of at most $8.27\%$. We also validate the effectiveness and practical potential of our method on real-world APIs and defense methods. Furthermore, our method promotes other downstream tasks, i.e., transfer adversarial attacks.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/black-box-dissector-towards-erasing-based/code)

13 Replies

Loading