Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text Watermarking, Multi-bit Watermarking, Large Language Models, Block-wise Error Correction, Window-Shifting Detection, Insertion/Deletion Robustness, Codeword-Presence Verification
Abstract: Recent multi-bit watermarking methods for large language models (LLMs) have focused primarily on maximizing extraction rates. However, our reproduction studies reveal a critical limitation: these approaches suffer from unacceptably high false positive rates (FPR) that undermine their practical deployment. Specifically, existing multi-bit encoding schemes like RS-Watermark achieve high true positive rates even with insertion/deletion attacks but exhibit FPR exceeding 0.90, rendering them unreliable for real-world applications. We propose a robust multi-bit text watermarking framework that addresses this reliability challenge through two key innovations: (i) block-wise error correction that embeds complete codewords within independent text segments, localizing the impact of edits and preventing cascade failures, and (ii) window-shifting detection that systematically recovers codewords despite insertion/deletion-induced misalignments. Our method verifies watermark presence by confirming recovery of the initially embedded codewords, significantly reducing false positives while maintaining high detection accuracy. Experiments on OPT-1.3B and LLaMA-3.2-3B demonstrate substantial improvements over existing multi-bit methods. Under 10\% synonym substitution attacks on 200-token texts, our approach achieves TPR of 0.965 with FPR of 0.02 (Precision: 0.9797), compared to RS-Watermark's TPR of 0.97 with FPR of 0.925 (Precision: 0.5132). The framework is code-agnostic, supports progressive detection from partial text, and provides theoretical guarantees for false-positive control. These results establish our method as a practical solution for reliable multi-bit watermarking in production environments.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23276
Loading