Keywords: LLM, multibit watermarks, watermarks
TL;DR: We propose a multibit LLM watermarks that encode the full message in every tokens.
Abstract: With LLM watermarking already being deployed commercially, practical applications increasingly require *multibit* watermarks that encode more complex payloads, such as user IDs or timestamps, into the generated text.
In this work, we propose a fundamentally new approach for multibit watermarking: introducing binomial encoding to directly encode every bit of the payload at every token position.
We complement our approach with a *stateful encoder* that during generation dynamically redirects encoding pressure toward underencoded bits.
Our evaluation against 8 baselines on up to 64-bit payloads shows that our scheme achieves superior message accuracy and robustness, with the gap to baseline methods widening in more relevant settings (i.e., large payloads and low-distortion regimes).
At the same time, we challenge prior works’ evaluation metrics, highlighting their lack of practical insights, and introduce *per-bit confidence scoring* as a practically relevant metric for evaluating multibit LLM watermarks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 366
Loading