On Forging Semantic Watermarks in Diffusion Models: A Theoretical Perspective

Cheng-Yi Lee; Yu-Feng Chen; Chun-Shien Lu; Jun-Cheng Chen

On Forging Semantic Watermarks in Diffusion Models: A Theoretical Perspective

Cheng-Yi Lee, Yu-Feng Chen, Chun-Shien Lu, Jun-Cheng Chen

Published: 24 Sept 2025, Last Modified: 07 Nov 2025NeurIPS 2025 Workshop GenProCCEveryoneRevisionsBibTeXCC BY 4.0

Track: Regular paper

Keywords: Generative Watermarking, Robustness, Diffusion Model, Forgery Attack, Rate-distortion Theory

TL;DR: We re-examine semantic watermarking in diffusion models through rate-distortion theory, explain why forgery attacks occur, and propose more robust model-specific properties.

Abstract: Semantic watermarks have emerged as a promising technique for latent diffusion models, embedding information by subtly modifying the initial latent noise. While robust to common perturbations, recent studies indicate that semantic watermarks remain vulnerable to black-box forgery attacks. In this paper, we provide a theoretical analysis of such attacks through the lens of rate-distortion theory. Meanwhile, we propose the CrossRobust metric to evaluate the watermark robustness against black-box forgery attacks across proxy models. This metric is grounded in the concept of model specificity, the tolerance of the watermark against the forgery attacks while being detectable by the original model. Additionally, we also show that model mismatch inevitably introduces an irreducible distortion error when proxy models are used. Extensive experiments demonstrate that the proposed metric can effectively estimate the robustness of existing approaches and offer new insights into the design of improved semantic watermarks and verification mechanisms.

Submission Number: 60

Loading