DynamicBias: Sequence-Aware Calibrated Watermarking for Large Language Models

Sangjun Moon; Jingun Kwon; Hidetaka Kamigaito

DynamicBias: Sequence-Aware Calibrated Watermarking for Large Language Models

Sangjun Moon, Jingun Kwon, Hidetaka Kamigaito

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: watermarking, robustness, safety

Abstract: Text watermarking has attracted significant research interest as a way to mitigate LLM-related harms by enabling reliable identification of machine-generated text. In particular, "green" and "red" vocabulary-partition watermarking, which uses a static bias to skew token sampling toward green tokens and away from red, is a promising approach. However, a persistent trade-off remains: stronger watermarks improve detectability but can harm quality, while weaker ones preserve quality but are harder to detect and easier to remove via paraphrasing. A key reason is that the static bias ignore heterogeneous logit distributions across models, domains, and languages, yielding inconsistent performance and hindering practical deployment. Our preliminary investigation shows substantial variability in these distributions and the associated performance disparities, driven by model certainty, measured as the margin between the top logit and the average of a small pool of next-best token logits. Building on this observation, we propose DynamicBias, which calibrates the bias at each step using a sequence-level average of this margin with a single scaling parameter $\alpha$. Theoretically, we show that DynamicBias admits a unique optimal $\alpha$ and increases expected detectability as the sequence-level margin grows. This calibration yields consistent detectability across models and integrates directly with existing vocabulary-partition watermarks, offering a practical solution for real-world deployment. Extensive experiments across four LLMs and three languages demonstrate improved detection with competitive text quality and stronger robustness to paraphrasing.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 24424

Loading