Dimension-independent Certified Neural Network Watermarks via Mollifier Smoothing

Published: 24 Apr 2023, Last Modified: 15 Jun 2023ICML 2023 PosterEveryoneRevisions
Abstract: Certified\_Watermarks is the first to provide a watermark certificate against $l_2$-norm watermark removal attacks, by leveraging the randomized smoothing techniques for certified robustness to adversarial attacks. However, the randomized smoothing techniques suffer from hardness of certified robustness in high-dimensional space against $l_p$-norm attacks for large $p$ ($p>2$). The certified watermark method based on the randomized smoothing is no exception, i.e., fails to provide meaningful certificates in high-dimensional space against the $l_p$-norm watermark removal attacks ($p>2$). By leveraging mollifier theory, this paper proposes a mollifier smoothing method with dimension-independent certified radius of our proposed smooth classifier, for conducting the certified watermark problem against the $l_p$-norm watermark removal attacks ($1 \leq p \leq \infty$) for high parameter dimension $d$. Based on partial differential equation (PDE) theory, an approximation of mollifier smoothing is developed to alleviate the inefficiency of sampling and prediction in the randomized smoothing as well as numerical integration in the mollifier smoothing, while maintaining the certified watermark against the $l_p$-norm watermark removal attacks ($1 \leq p \leq \infty$).
Submission Number: 2070
Loading