Balanced Watermark: A Simple High-Imperceptibility Watermark for Large Language Models

ACL ARR 2024 June Submission2362 Authors

15 Jun 2024 (modified: 07 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In order to counteract the potential risks posed by increasingly intelligent Large Language Models (LLMs), several scholars attempt to apply watermarking to the detection of LLM-generated text. Watermark researchers typically focus on detectability, robustness and invisible, but they tend to overlook the imperceptibility, which is crucial for preventing the watermark from being cracked. Watermarks with low imperceptibility are easily stolen and analyzed by malicious users, who can then forge watermarked text. To fill this research gap, we design Balanced Watermark (BW) by balancing the watermark strength across the vocabulary, achieving a fit to a non-watermarked LLM distribution to enhance imperceptibility. To effectively evaluate the imperceptibility of watermarks, we design a metric to evaluate for the first time. Our experiments prove that BW effectively improves imperceptibility and maintains high performance of the watermark in other features.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: security and privacy
Contribution Types: NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 2362
Loading