Two Halves Make a Whole: How to Reconcile Soundness and Robustness in Watermarking for Large Language Models

Lei Fan; Chenhao Tang; Weicheng Yang; Hong-Sheng Zhou

Two Halves Make a Whole: How to Reconcile Soundness and Robustness in Watermarking for Large Language Models

Lei Fan, Chenhao Tang, Weicheng Yang, Hong-Sheng Zhou

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language model, Watermark, Robustness, Soundness

Abstract: Watermarking techniques have been used to safeguard AI-generated content. In this paper, we study publicly detectable watermarking schemes (Fairoze et al.), and have several research findings. First, we observe that two important security properties, robustness and soundness, may conflict with each other. We then formally investigate these two properties in the presence of an arguably more realistic adversary that we called editing-adversary, and we can prove an impossibility result that, the robustness and soundness properties cannot be achieved via a publicly-detectable single watermarking scheme. Second, we demonstrate our main result: we for the first time introduce the new concept of publicly-detectable dual watermarking scheme, for AI-generated content. We provide a novel construction by using two publicly-detectable watermarking schemes; each of the two watermarking schemes can achieve “half” of the two required properties: one can achieve robustness, and the other can achieve soundness. Eventually, we can combine the two halves into a whole, and achieve the robustness and soundness properties at the same time. Our construction has been implemented and evaluated.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5641

Loading