Track: Regular paper
Keywords: Watermark Removal, Diffusion Models, Generative AI, Robustness
TL;DR: We present a quality-preserving pipeline that removes invisible watermarks from diffusion images while maintaining high fidelity, which is the winning solution to the NeurIPS 2024 watermark removal challenge.
Abstract: Content watermarking is an important tool for the authentication and copyright protection of digital media.
However, it is unclear whether existing watermarks are robust against adversarial attacks.
We present the $\textbf{winning solution}$ to the NeurIPS 2024 $\textit{Erasing the Invisible Challenge}$, which stress-tests watermark robustness under varying degrees of adversary knowledge.
The challenge consisted of two tracks: a black-box and beige-box track, depending on whether the adversary knows which watermarking method was used by the provider.
For the $\textbf{beige-box}$ track, we leverage an $\textit{adaptive}$ VAE-based evasion attack, with a test-time optimization and color-contrast restoration in CIELAB space to preserve the image's quality. For the $\textbf{black-box}$ track, we first cluster images based on their artifacts in the spatial or frequency-domain. Then, we apply image-to-image diffusion models with controlled noise injection and semantic priors from ChatGPT-generated captions to each cluster with optimized parameter settings. Empirical evaluations demonstrate that our method successfully $\textbf{achieves near-perfect watermark removal}$ (95.7\%) with negligible impact on the residual image's quality.
We hope that our attacks inspire the development of more robust image watermarking methods.
Submission Number: 61
Loading