MIRRORMARK: A Distortion-Free Multi-Bit Watermark for Large Language Models

Ya Jiang; Massieh Kordi Boroujeny; Surender Suresh Kumar; Kai Zeng

MIRRORMARK: A Distortion-Free Multi-Bit Watermark for Large Language Models

Ya Jiang, Massieh Kordi Boroujeny, Surender Suresh Kumar, Kai Zeng

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Watermark, bias, LLM Security

TL;DR: We propose a multi-bit and distortion-free watermark for large language models.

Abstract: As large language models (LLMs) become increasingly integral to broad applications such as question answering and content creation, reliable content attribution and accountability have grown increasingly urgent. Watermarking offers a promising approach to identifying AI-generated text. However, existing approaches either provide only a binary provenance signal or perturb the sampling distribution, degrading the text quality; approches that preserve text quality, in turn, often exhibit weak detectability and poor robustness. We propose MirrorMark, a multi-bit and distortion-free watermark for LLMs. By mirroring the sampling randomness in a measure-preserving way, MirrorMark embeds multi-bit messages without altering the token probability distribution during generation, and thus text quality is maintained by design. For robustness, we employ a content-based scheduler that partitions the messages into per-position symbols and allocates tokens to each symbol nearly uniformly, balancing token assignments across positions while maintaining robustness against desynchronization under insertions and deletions. We also present a theoretical analysis that models detection error versus the number of pseudorandom draws per generation step, offering interpretability to our empirical results and insights on the design of high-detectability multi-bit watermarks. In our comparisons with state-of-the-art multi-bit baselines, MirrorMark preserves the text quality comparable with non-watermarked text while delivering superior detectability: with 54 bits embedded in 300 tokens, it improves bit accuracy by 8–12\% and correctly identifies up to 11\% more watermarked texts when the false positive rate is fixed at 1\%. These results show that MirrorMark enables practical attribution, offering a scalable path to provenance and accountability in LLM deployment.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 10657

Loading