Advancing Beyond Identification: Multi-bit Watermark for Language Models

KiYoon Yoo, Wonhyuk Ahn, Nojun Kwak

Published: 01 Jan 2023, Last Modified: 05 Nov 2023CoRR 2023Readers: Everyone

Abstract: We propose a method to tackle misuses of large language models beyond the identification of machine-generated text. While existing methods focus on detection, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ($\geq$ 32-bit) without finetuning, and maintaining text quality, while allowing zero-bit detection all at the same time. Moreover, our watermark is relatively robust under strong attacks like interleaving human texts and paraphrasing.

0 Replies