Necessary and Sufficient Watermark for Large Language Models

Necessary and Sufficient Watermark for Large Language Models

ACL ARR 2024 June Submission411 Authors

10 Jun 2024 (modified: 08 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) can now generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes. Thus, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing methods have successfully detected texts generated by LLMs, they inevitably degrade the text quality. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts with minimum text quality degradation. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts, and we formulate the NS-Watermark as a constrained optimization problem. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to $30$ BLEU scores.

Paper Type: Long

Research Area: Generation

Research Area Keywords: watermarking

Contribution Types: NLP engineering experiment

Languages Studied: english

Submission Number: 411

Loading