Keywords: Content watermark, watermark removal, watermark forging
Abstract: AI-Generated Content (AIGC) is gaining great popularity, with many emerging commercial services using advanced generative models to create realistic images and fluent text. Regulating such content is crucial to prevent policy violations, such as unauthorized commercialization or unsafe content distribution.
Watermarking is a promising solution for content attribution and verification, and numerous watermarking approaches have been proposed recently. However, we demonstrate its vulnerability to two key attacks: (1) Watermark removal: the adversary can easily erase the embedded watermark from the generated content and then use it freely bypassing the regulation of the service provider. (2) Watermark forging: the adversary can create illegal content with forged watermarks from another user, causing the service provider to make wrong attributions.
We propose Warfare, a unified attack framework leveraging a pre-trained diffusion model for content processing and a generative adversarial network for watermark manipulation. Evaluations across datasets and embedding setups show that Warfare can achieve high success rates while maintaining the quality of the generated content. We further introduce Warfare-Plus, which enhances efficiency without compromising effectiveness.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 25085
Loading