Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
Abstract: The intellectual property of deep image-to-image models can
be protected by the so-called box-free watermarking. It uses
an encoder and a decoder, respectively, to embed into and
extract from the model’s output images invisible copyright
marks. Prior works have improved watermark robustness,
focusing on the design of better watermark encoders. In this
paper, we reveal an overlooked vulnerability of the unpro
tected watermark decoder which is jointly trained with the
encoder and can be exploited to train a watermark removal
network. To defend against such an attack, we propose the
decoder gradient shield (DGS) as a protection layer in the
decoder API to prevent gradient-based watermark removal
with a closed-form solution. The fundamental idea is in
spired by the classical adversarial attack, but is utilized
for the first time as a defensive mechanism in the box-free
model watermarking. We then demonstrate that DGS can
reorient and rescale the gradient directions of watermarked
queries and stop the watermark remover’s training loss from
converging to the level without DGS, while retaining de
coder output image quality. Experimental results verify the
effectiveness of the proposed method. Code of paper is
available at https://github.com/haonanAN309/CVPR-2025-Official-Implementation-Decoder-Gradient-Shield.
Loading