Keywords: AI Safety, LLM Detection, LLM Watermarking
TL;DR: We introduce a new LLM output watermarking scheme, achieving SotA for detectability-quality target in English for natural language generation tasks
Abstract: Watermarking, whereby LLM outputs are steered to encode an easily identifiable digital signature, has recently gained attention as a potential solution for detecting synthetically generated text. However, watermarking schemes require tradeoffs between detectability (i.e., how easily the watermark can be identified by an algorithm) and quality of the generated text (i.e., the stylistic and semantic disruption to the normal generation of the LLM). In this work, we propose a simple extension to the Soft Red List watermark, Softer Red List, which enables higher detectability while maintaining text quality on par with non-watermarked text. Specifically, Softer Red List improves the classical red/green token algorithm by adding a probability truncation filter before boosting the probabilities tokens in the green list. Despite its simplicity, Softer Red List matches or exceeds the performance of previously published LLM watermarking schemes, notably achieving a better detection rate at a low false positive rate (FPR) than SynthID in the disinformation detection setting, all while maintaining comparable perplexity and better reasoning capacities.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 16
Loading