Distortion-free Watermarking for Large Language Models via Adaptive Top-$p$ Sampling

03 Sept 2025 (modified: 25 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: watermarking, large language models, top-$p$ sampling, beta distribution
Abstract: Incorporating watermarking techniques into large language models (LLMs) is a promising solution for determining whether text is generated by a specific LLM. Existing green-red list-based watermarking studies embed watermarks by roughly adding bias to the logits of all tokens in the green list, leading to distorted output due to the disturbance of the original generative distribution. To move towards distortion-free watermarking, we propose $p$-Mark, an adaptive scheme to derive potential green tokens that can add bias by leveraging the Beta Distribution to dynamically adjust the sampling threshold in Top-$p$ Sampling. This essentially ensures the diversity of text watermarks while preserving the quality of text output during the watermarking process. Experiments on various LLMs show that our $p$-Mark improves the quality of text generation while showing superior watermark detectability compared to existing baselines.
Primary Area: generative models
Submission Number: 1612
Loading