TL;DR: We introduce a practical and statistically valid watermarking scheme, provide theoretical guarantees, and empirically evaluate our scheme.Ω
Abstract: Watermarking, the process by which Large Language Model (LLM) servers imbed an imperceptible signal at inference time in order to detect text generated by their own models, has grown in importance due to the significant improvements in natural language processing tasks by modern LLMs. Current approaches are often impractical due to generation latency, detection time, degradation in text quality, or robustness; such problems often arise due to the focus on token level watermarking, which ignores the inherent structure of text. In this work, we introduce a new scheme, GaussMark, that is simple and efficient to implement, has formal statistical guarantees, comes at no cost in generation latency, and embeds the watermark into the weights of the model itself, providing a structural watermark. Our approach is based on Gaussian independence testing and is motivated by recent empirical observations that minor additive corruptions to LLM weights can result in models of identical (or even improved) quality. We provide formal statistical bounds on the validity and power of our procedure and, through an extensive suite of experiments, demonstrate that GaussMark is reliable, efficient, relatively robust to corruption, and can be instantiated with essentially no loss in model quality.
Lay Summary: As AI models become better at writing human-like content, it is getting harder to tell whether a piece of text was written by a human or by a language model. This creates challenges in areas like education, journalism, and law, where it can be important to know who actually created the content.
This research introduces a new method called GaussMark that helps identify when a text has been written by a language model. Instead of changing the text sampling process itself (like previous approaches did), GaussMark works by slightly adjusting the weights of the language model in a way that leaves behind a hidden signal. This signal can later be detected, showing that the text came from that specific model.
The method is fast, easy to use, and does not harm the quality or speed of the text generation. It is also based on sound statistical reasoning, which means it can provide strong evidence that a piece of text was created by a specific language model. In summary, GaussMark offers a reliable and efficient way to track the origin of AI-generated text, making these powerful tools more trustworthy and responsible.
Primary Area: Theory->Everything Else
Keywords: Watermarking, Trustworthy ML, Language Models
Submission Number: 2237
Loading