Keywords: language modes, watermarks
Abstract: A recent watermarking scheme for language models achieves distortion-free embedding and robustness to edit-distance attacks. However, it suffers from limited generation diversity and high detection overhead.
In parallel, recent research has focused on undetectability—a property ensuring that watermarks remain difficult for adversaries to detect and spoof.
In this work, we introduce a new class of watermarking schemes constructed through *probabilistic automata*.
We present two instantiations: (i) a practical scheme with exponential generation diversity and computational efficiency, and (ii) a theoretical construction with formal undetectability guarantees under cryptographic assumptions. Extensive experiments on LLaMA-3B and Mistral-7B validate the superior performance of our scheme in terms of robustness and efficiency.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16256
Loading