SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text Quality

Duy Cao Hoang; Thanh Quoc Hung Le; Rui Chu; Ping Li; Weijie Zhao; Yingjie Lao; Khoa D Doan

SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text Quality

Duy Cao Hoang, Thanh Quoc Hung Le, Rui Chu, Ping Li, Weijie Zhao, Yingjie Lao, Khoa D Doan

Published: 06 Mar 2025, Last Modified: 16 Apr 2025WMARK@ICLR2025EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 9 pages)

Keywords: watermarking, large language models

TL;DR: Preserving generated text quality and watermark effectiveness by watermarking small portions of tokens distributed across the generated text.

Abstract: With the widespread adoption of Large Language Models (LLMs), concerns about potential misuse have emerged. To this end, watermarking has been adapted to LLM, enabling a simple and effective way to detect and monitor generated text. However, while the existing methods can differentiate between watermarked and unwatermarked text with high accuracy, they often face a trade-off between the quality of the generated text and the effectiveness of the watermarking process. In this work, we present a novel type of LLM watermark, *Sparse Watermark*, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text. To demonstrate this type of watermark, we introduce **SpARK**, a **Sp**arse Waterm**ARK** method that achieves sparsity by anchoring watermarked tokens to words that have specific Part-of-Speech (POS) tags. Our experimental results demonstrate that the proposed watermarking scheme, albeit *embarrassingly simple*, is *incredibly effective*, achieving high detectability while generating text that outperforms previous LLM watermarking methods in quality across various tasks.

Presenter: ~Duy_Cao_Hoang1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 58

Loading