Google's LLM Watermarking System is Vulnerable to Layer Inflation Attack
Keywords: LLM Watermarking, Watermark Removal Attack, Robustness
Abstract: Google's SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts.
This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. Specifically, we prove that the mean score used in SynthID-Text is inherently vulnerable to increased tournament layers, and design a \emph{black-box layer inflation attack} to completely break SynthID-Text.
Source code is available at \url{https://github.com/romidi80/Synth-ID-MeanScore-Break}.
Submission Number: 110
Loading