SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

ACL ARR 2024 June Submission4338 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Watermarking is a promising copyright protection method for Deep Neural Networks (DNNs). It works by embedding a secret identity message into the DNN during training, and extracting it later when copyright is disputed. Prior work has proposed various techniques that can embed secret identity messages into different layers of a DNN. We observe that models nowadays are frequently created and distributed in the form of Low-Rank Adaptation (LoRA) weights, because of its significant savings in training cost. We propose SEAL (SEcure wAtermarking on LoRA weights), the first watermarking method tailored for LoRA weights. Unlike existing methods that focus on specific layers and are unsuitable for LoRA's unique structure, SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles this passport with the LoRA weights through finetuning, and distributes the finetuned weights after hiding the passport. We demonstrate that SEAL is robust against a variety of known attacks, and works without compromising the performance of watermarked models on various NLP tasks.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: transfer learning, parameter-efficient-training

Contribution Types: Model analysis & interpretability, Position papers

Languages Studied: English

Submission Number: 4338

Loading