A Nested Watermark for Large Language Models

ACL ARR 2024 June Submission4270 Authors

16 Jun 2024 (modified: 18 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid development of large language models (LLMs) has raised concerns about the potential misuse of these models for generating fake news and misinformation. To mitigate this risk, watermarking techniques for auto-regressive language models have been proposed as a means of detecting text generated by LLMs. However, this method assumes that the target text, which is watermarked, contains a sufficient number of tokens, and the detection accuracy decreases as the number of tokens in the text becomes smaller. To address this issue, we introduce a novel nested watermark that embeds two watermarks in a nested structure. Our method ensures that high detection accuracy can be achieved even with fewer tokens compared to conventional approaches. Our experiments show that the nested watermark outperformed the single watermark in terms of embedding success ratio and text quality when dealing with short text.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: rumor/misinformation detection;
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4270
Loading