Abstract: Deep neural network (DNN) models are widely used in various fields, such as pattern recognition and natural language processing, and provide considerable commercial value to their owners. Embedding a digital watermark in the model allows the legitimate owner to detect unauthorized use of the model. However, the existing DNN watermarking methods are vulnerable to model extraction attacks since the watermark task and the original model task are independent. In this article, a novel collaborative DNN watermarking framework is proposed to defend against model extraction attacks by establishing cooperation between the watermark generation and embedding. Specifically, the trigger samples are not only imperceptible to ensure perceptual stealth security but also infused with target-label information to guide the following feature associations. In the process of watermark embedding, the feature representation of trigger samples is forced to be similar to that of the task distribution samples via feature coupling. Consequently, the trigger samples from our framework can be recognized in the stolen model as task distribution samples, so that the ownership of the model can be successfully verified. Extensive experiments on CIFAR10, CIFAR100, and ImageNet demonstrate the effectiveness and superior performance of the proposed watermarking framework against various model extraction attacks.
Loading