THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE

Published: 26 Jan 2026, Last Modified: 17 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: watemarking, deep learning, AI Security, Re-Watermarking, attack
Abstract: Watermarking has been widely used for copyright protection of digital images. Deep learning-based (DL) watermarking systems have recently emerged as more effective than traditional methods, offering improved fidelity and resilience against attacks. Among the various threats to DL watermarking systems, self-re-watermarking attacks represent a critical and underexplored challenge. In such attacks, the same encoder is maliciously reused to embed a new message into an already watermarked image. This process effectively prevents the original decoder from retrieving the original watermark without introducing perceptual artifacts. In this work, we make two key contributions. First, we introduce the self-re-watermarking threat model as a novel attack vector and demonstrate that existing state-of-the-art watermarking methods consistently fail under such attacks. Second, we develop a self-aware watermarking framework to defend against this threat. Our key insight for mitigating this risk is to limit the sensitivity of the watermarking models to the inputs, thereby resisting re-embedding of new watermarks. To achieve this, we propose a self-aware deep watermarking framework that extends Lipschitz constraints to the watermarking process, regulating encoder–decoder sensitivity in a principled manner. In addition, the framework incorporates re-watermarking adversarial training, which further constrains sensitivity to distortions arising from re-embedding. The proposed method provides theoretical bounds on message recoverability under malicious encoder based re-watermarking and demonstrates strong empirical robustness against diverse scenarios of re-watermarking attempts. Moreover, it maintains high visual fidelity and demonstrates competitive robustness against common image processing distortions compared to state-of-the-art watermarking methods. This work establishes a robust defense against both standard distortions and self-re-watermarking attacks. Code available at https://github.com/SVithurabiman/SRW.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 17673
Loading