Abstract: Large language models (LLMs) have evolved rapidly and demonstrated superior performance over the past few months. Training these models is both expensive and time-consuming. Consequently, some companies have begun to offer embedding as a service (EaaS) based on these LLMs to reap the benefits. However, this makes them potentially vulnerable to model extraction attacks which can replicate a functionally similar model and thereby infringe upon copyright. To protect the copyright of LLMs for EaaS, we propose a backdoor watermarking method by injecting a secret cosine signal into embeddings of original text with triggers. The secret signal, generated and authenticated using identity information, establishes a direct link between the watermark and the copyright owner. Experimental results demonstrate the method’s effectiveness, showing minimal impact on downstream tasks and high detection accuracy, as well as exhibiting resilience to forgery attacks.
Loading