Abstract: Deep neural networks (DNNs) is susceptible to surrogate attacks, where adversaries use surrogate data and corresponding outputs from the target model to build their own stolen model. Model stealing attacks jeopardize model privacy and model owners' commercial benefits. To address this issue, this paper proposes a hybrid protection approach-Maximize the confidence differences between benign samples and adversarial samples (MCD), to protect models from theft. Firstly, the LogitNorm approach is used to overcome the overconfidence problem in adversary query classification. Then, samples are divided into four groups according to ES and RS. Different groups are poisoned by different degrees. In addition to enhancing defensive performance and accounting for model integrity, the MCD uses a trigger to confirm the cloned model's owner. Experimental results show that the MCD defends against a variety of original models and attack techniques well. Against KnockoffNets and DFME attacks, the MCD yields an average defense performance of 54.58 % on five datasets, which is a great improvement over other defenses. Compared to other poisoning techniques, the Strong Poisoning (SP) module reduces the adversary's accuracy by 48.23 % on average. Additionally, the MCD overcomes the issue of OOD overconfidence while safeguarding the model accuracy in OOD detection and reduces the misclassification rate of ID samples for multiple OOD datasets.
Loading