Abstract: Text watermarking, which modify tokens to embed watermark, has proven effective in detecting machine-generated texts. Yet its
application to low-entropy texts like code and mathematics presents significant challenges. A fair number of tokens in these texts are hardly modifiable without changing the intended meaning, causing statistical measures to falsely indicate the absence of a watermark. Existing research addresses this issue by rely mainly on a limited number of high-entropy tokens, which are considered flexible for modification, and accurately reflecting watermarks. However, their detection accuracy remains suboptimal, as they neglect strong watermark evidences embedded in low entropy tokens modified through watermarking. To overcome this limitation, we introduce Bayesian Inference-based Watermark Detector (BIWD), which thoroughly exploit watermark information from every token, by leveraging the posterior probability of watermark’s presence. We theoretically prove the optimality of our method in terms of detection accuracy, and demonstrate its superiority across various datasets, models, and watermark injection strategies. Notably, our method achieves up to 50% and 70% relative improvements in detection accuracy over the best baselines in code generation and math problem-solving tasks, respectively.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: ethical considerations in NLP applications,policy and governance,security and privacy,value-centered design
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 579
Loading