Keywords: Deep Learning, Image Watermarking, Watermark Removal Attack
Abstract: Image watermarking embeds imperceptible signals into AI-generated images for deepfake detection and provenance verification. Although recent semantic-level watermarking methods demonstrate strong resistance against conventional pixel-level removal attacks, their robustness against more advanced removal strategies remains underexplored, raising concerns about their reliability in practical scenarios. Existing removal attacks primarily operate in the pixel domain without altering image semantics, which limits their effectiveness against semantic-level watermarks.
In this paper, we propose Next Frame Prediction Attack (NFPA), the first semantic-level removal attack. Unlike pixel-level attacks, NFPA formulates watermark removal as a video generation task: it treats the watermarked image as the initial frame and aims to subtly manipulate the image semantics to generate the next-frame image, i.e., the unwatermarked image.
We conduct a comprehensive evaluation on eight state-of-the-art image watermarking schemes, demonstrating that NFPA consistently outperforms thirteen removal attack baselines in terms of the trade-off between watermark removal and image quality. Our results reveal the vulnerabilities of current image watermarking methods and highlight the urgent need for more robust watermarks.
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 3334
Loading