A Principled Approach to Natural Language Watermarking

Zhe Ji; Qiansiqi Hu; Yicheng Zheng; Liyao Xiang; Xinbing Wang

A Principled Approach to Natural Language Watermarking

Zhe Ji, Qiansiqi Hu, Yicheng Zheng, Liyao Xiang, Xinbing Wang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, there is a surge in machine-generated natural language content being misused by unauthorized parties. Watermarking is a well-recognized technique to address the issue by tracing the provenance of the text. However, we found most existing watermarking systems for texts are subject to ad hoc design and thus suffer from fundamental vulnerabilities. We propose a principled design for text watermarking based on a theoretical information-hiding framework. The watermarking party and attacker play a rate-distortion-constrained capacity game to achieve the maximum rate of reliable transmission, i.e., watermark capacity. The capacity can be expressed by the mutual information between the encoding and the attacker's corrupted text, indicating how many watermark bits are effectively conveyed under distortion constraints. The system is realized by a learning-based framework with mutual information neural estimators. In the framework, we adopt the assumption of an omniscient attacker and let the watermarking party pit against the attacker who is fully aware of the watermarking strategy. The watermarking party thus achieves higher robustness against removal attacks. We further show that the incorporation of side information substantially enhances the efficacy and robustness of the watermarking system. Experimental results have shown the superiority of our watermarking system compared to the state-of-the-art in terms of capacity, robustness, and preserving text semantics.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Content] Vision and Language

Relevance To Conference: With the popularity of large language models in multimedia applications, how to protect intellectual property for natural language texts has become a major concern. This work presents a novel approach to text watermarking, addressing the critical concern of intellectual protection and provenance tracing in natural language. We formulate the watermarking as a rate-distortion-constrained capacity game between the watermarking party and a potential attacker. This game framework aims to achieve the maximum rate of reliable watermark transmission, or the watermark capacity under attacks. We consider our watermarking system an interesting multimedia application.

Supplementary Material: zip

Submission Number: 4567

Loading