Abstract: Model watermarking has emerged as a promising solution for protecting the intellectual property of Artificial Intelligence (AI)-generated images. However, the security of watermark verification systems remains insufficiently studied, particularly under adversarial conditions. In this work, we propose a novel watermark forgery attack framework that enables an attacker to train a counterfeit watermark extractor capable of consistently extracting a forged watermark from images containing legitimate watermarks (generated by the victim model), thereby subverting copyright verification. Specifically, we develop the counterfeit extractor based on a ResNet-18 backbone and design a hybrid loss function to align the extracted watermarks with the pre-defined targets. Through this approach, the trained extractor reliably outputs the forged watermark for victim-model-generated images while producing random outputs for clean images. Extensive experiments demonstrate the attack’s effectiveness, with the forged extractor achieving over 99% accuracy–comparable to the original watermark extractor’s performance. These findings reveal critical security vulnerabilities in current model watermarking systems and provide important insights for developing more robust watermarking solutions.
External IDs:dblp:journals/spl/LuoLZ25
Loading