Google's LLM Watermarking System is Vulnerable to Layer Inflation Attack

Published: 02 Mar 2026, Last Modified: 12 Mar 2026ICLR 2026 Trustworthy AIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Watermarking, Watermark Removal Attack, Robustness
Abstract: Google's SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts. This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. Specifically, we prove that the mean score used in SynthID-Text is inherently vulnerable to increased tournament layers, and design a \emph{black-box layer inflation attack} to completely break SynthID-Text. Source code is available at \url{https://github.com/romidi80/Synth-ID-MeanScore-Break}.
Submission Number: 110
Loading