Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Anonymous

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We propose embedding multi-bit information into language model outputs for tackling malicious misuse of large language models.

Abstract: We show the viability of tackling misuses of large language models beyond the identification of machine-generated text. While existing methods focus on detection only, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking (Kirchenbauer et al., 2023a),, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ($\geq$ 32-bit) without finetuning, and maintaining text quality, while allowing zero-bit detection all at the same time. Moreover, our watermark is relatively robust under strong attacks like interleaving human texts and paraphrasing. We compare with existing works to show the effectiveness of our scheme in terms of robustness and latency.

Paper Type: long

Research Area: NLP Applications

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

0 Replies

Loading