Keywords: LLM security, watermarking
Abstract: The rapid advancement of LLMs (Large Language Models) has established them as a foundational technology for many AI- and ML-powered human–computer interactions. A critical challenge in this context is the attribution of LLM-generated text --- for example, identifying the specific language model that generated it or the individual user who prompted the model. This capability is essential for combating misinformation, fake news, misinterpretation, and plagiarism. One of the key techniques for addressing this challenge is digital watermarking. This work presents a watermarking scheme for LLM-generated text based on Lagrange interpolation, enabling the recovery of a multi-bit watermark even when the text has been redacted by an adversary. The core idea is to embed a continuous sequence of points (x, f(x)) that lie on a single straight line. During extraction, the algorithm recovers the original points along with many spurious ones, forming an instance of the Maximum Collinear Points (MCP) problem, which can be solved efficiently. Experimental results demonstrate that the proposed method is scalable and effective, allowing the embedding of a multi-bit watermark.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: safety, security, watermarking
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 6458
Loading