Advancing Beyond Identification: Multi-bit Watermark for Large Language ModelsDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We propose embedding multi-bit information into language model outputs for tackling malicious misuse of large language models.
Abstract: We show the viability of tackling misuses of large language models beyond the identification of machine-generated text. While existing methods focus on detection only, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking (Kirchenbauer et al., 2023a),, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ($\geq$ 32-bit) without finetuning, and maintaining text quality, while allowing zero-bit detection all at the same time. Moreover, our watermark is relatively robust under strong attacks like interleaving human texts and paraphrasing. We compare with existing works to show the effectiveness of our scheme in terms of robustness and latency.
Paper Type: long
Research Area: NLP Applications
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview