Warfare: Breaking the Watermark Protection of AI-Generated Content

Guanlin Li; Yifei Chen; Jie Zhang; Shangwei Guo; Han Qiu; Guoyin Wang; Jiwei Li; Tianwei Zhang

Warfare: Breaking the Watermark Protection of AI-Generated Content

Guanlin Li, Yifei Chen, Jie Zhang, Shangwei Guo, Han Qiu, Guoyin Wang, Jiwei Li, Tianwei Zhang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Content watermark, watermark removal, watermark forging

Abstract: AI-Generated Content (AIGC) is gaining great popularity, with many emerging commercial services using advanced generative models to create realistic images and fluent text. Regulating such content is crucial to prevent policy violations, such as unauthorized commercialization or unsafe content distribution. Watermarking is a promising solution for content attribution and verification, and numerous watermarking approaches have been proposed recently. However, we demonstrate its vulnerability to two key attacks: (1) Watermark removal: the adversary can easily erase the embedded watermark from the generated content and then use it freely bypassing the regulation of the service provider. (2) Watermark forging: the adversary can create illegal content with forged watermarks from another user, causing the service provider to make wrong attributions. We propose Warfare, a unified attack framework leveraging a pre-trained diffusion model for content processing and a generative adversarial network for watermark manipulation. Evaluations across datasets and embedding setups show that Warfare can achieve high success rates while maintaining the quality of the generated content. We further introduce Warfare-Plus, which enhances efficiency without compromising effectiveness.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 25085

Loading