Erasing and Tampering Statistical Watermarks via Re-watermarking in Large Language Models

TMLR Paper5940 Authors

19 Sept 2025 (modified: 29 Nov 2025)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid development and widespread adoption of large language models have intensified concerns about copyright disputes, misinformation spread, and content authenticity. Statistical watermarking has been proposed as a potential solution for content source verification, though its reliability remains questionable. This study examines a re-watermarking attack based on text rephrasing. Our theoretical analyses and experimental results demonstrate that: (1) new watermarks can be successfully applied to already watermarked text; (2) these new watermarks effectively overwrite the originals, making them undetectable; and (3) compared to existing rephrasing-only attacks, re-watermarking causes comparable degradation in text fidelity. These findings reveal significant vulnerabilities in statistical watermarking techniques, challenging their effectiveness as reliable mechanisms for content attribution.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Samuel_Vaiter1
Submission Number: 5940
Loading