TL;DR: We propose a multi-bit and stealthy watermark for large language models.
Abstract: Watermarking for large language models (LLMs) offers a promising approach to identifying AI-generated text. Existing approaches, however, either compromise the distribution of original generated text by LLMs or are limited to embedding zero-bit information that only allows for watermark detection but ignores identification. We present StealthInk, a stealthy multi-bit watermarking scheme that preserves the original text distribution while enabling the embedding of provenance data, such as userID, TimeStamp, and modelID, within LLM-generated text. This enhances fast traceability without requiring access to the language model's API or prompts. We derive a lower bound on the number of tokens necessary for watermark detection at a fixed equal error rate, which provides insights on how to enhance the capacity. Comprehensive empirical evaluations across diverse tasks highlight the stealthiness, detectability, and resilience of StealthInk, establishing it as an effective solution for LLM watermarking applications.
Lay Summary: As powerful AI models like ChatGPT become more common, so does the risk of misuse—such as generating fake news, scams, or academic plagiarism. To help address these concerns, researchers have been exploring ways to add hidden “watermarks” to AI-generated text. These invisible tags can help detect whether a text was written by a machine and even trace where it came from.
Our work introduces StealthInk, a new technique that embeds more information into AI-generated text while keeping the text natural and hard to distinguish from human writing. Unlike many past methods that either distort the language or offer only simple yes/no answers (like “is this text AI-generated?”), StealthInk can encode rich details such as user ID and timestamps. It also holds up well even when attackers try to hide or fake the watermark, ensuring both privacy and accountability. This makes StealthInk a strong step toward safer and more trustworthy AI-generated content.
Primary Area: Applications
Keywords: LLM Watermark, bias, LLM Security
Submission Number: 9307
Loading