Semantic differentiation for tackling challenges in watermarking low-entropy constrained generation outputs

Semantic differentiation for tackling challenges in watermarking low-entropy constrained generation outputs

ICLR 2026 Conference Submission22517 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Watermarking, entropy, language model, generation, machine translation, summarization

TL;DR: This paper identifies shortcomings in existing watermarking algorithms and proposes a novel approach to watermark low-entropy, constrained generation tasks

Abstract: We posit and demonstrate that while the current approaches for language model (LM) watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks like machine translation and abstractive summarization due to the lower entropy of the output space. We investigate the reasons for such shortcomings in a variety of prominent watermarking approaches, and propose an effective solution based on sequence-level watermarking with semantic differentiation to watermark LLM outputs for constrained generation tasks that balances the output quality, watermark detectability, and imperceptibility of the watermark. Specifically, we show that token-level watermarking algorithms that modify logits over vocabulary during autoregressive generation, fail because they under-utilize the sequence entropy of the LM available for watermarking constrained generation outputs. While sequence-level semantic watermarking algorithms are promising alternatives for exploiting the higher sequence entropy compared to the low-levels of token-wise entropy, we identify a different fundamental drawback termed region collapse in the operationalization of current approaches that causes poor watermarking performance. Current approaches pseudorandomly partition the sequence-level representation space into valid and invalid regions for watermarking, but their operationalization encourages most high-quality output embeddings to all collapse into a single region causing a trade-off in output quality and watermarking effectiveness. To mitigate this, we devise a scheme SeqMark to differentiate the high quality output subspace and partition it into valid and invalid regions for watermarking, ensuring the even spread of high quality outputs among all the regions for effective watermarking without compromising the output quality. SeqMark substantially improves watermark detection accuracy (up to 28\% increase in \fscore) while maintaining high generation quality in constrained generation settings.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 22517

Loading