No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

Qi Pang; Shengyuan Hu; Wenting Zheng; Virginia Smith

No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

Qi Pang, Shengyuan Hu, Wenting Zheng, Virginia Smith

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: watermarking, large language models, security, privacy

TL;DR: We reveal and evaluate new attack vectors that exploit the common design choices of LLM watermarks.

Abstract: Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack---leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.

Primary Area: Privacy

Submission Number: 12729

Loading