SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

ACL ARR 2024 June Submission4338 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Watermarking is a promising copyright protection method for Deep Neural Networks (DNNs). It works by embedding a secret identity message into the DNN during training, and extracting it later when copyright is disputed. Prior work has proposed various techniques that can embed secret identity messages into different layers of a DNN. We observe that models nowadays are frequently created and distributed in the form of Low-Rank Adaptation (LoRA) weights, because of its significant savings in training cost. We propose SEAL (SEcure wAtermarking on LoRA weights), the first watermarking method tailored for LoRA weights. Unlike existing methods that focus on specific layers and are unsuitable for LoRA's unique structure, SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles this passport with the LoRA weights through finetuning, and distributes the finetuned weights after hiding the passport. We demonstrate that SEAL is robust against a variety of known attacks, and works without compromising the performance of watermarked models on various NLP tasks.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: transfer learning, parameter-efficient-training
Contribution Types: Model analysis & interpretability, Position papers
Languages Studied: English
Submission Number: 4338
Loading