Certified Watermarks for Neural Networks

Arpit Amit Bansal; Ping-yeh Chiang; Michael Curry; Hossein Souri; Rama Chellappa; John P Dickerson; Rajiv Jain; Tom Goldstein

Certified Watermarks for Neural Networks

Arpit Amit Bansal, Ping-yeh Chiang, Michael Curry, Hossein Souri, Rama Chellappa, John P Dickerson, Rajiv Jain, Tom Goldstein

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: certified defense, watermarking, backdoor attack

Abstract: Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.

One-sentence Summary: We propose the first certifiable watermark for neural networks, which is also empirically more robust.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=DPGhREq6uC

12 Replies

Loading