Keywords: Large Language Model, Intellectual Property Protection, Fingerprinting, Singular Values, Eigenvalues
TL;DR: A secure weight-based fingerprinting scheme for Robust LLM Intellectual Property Protection.
Abstract: The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods--whether behavior-based or structural--suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at \url{https://anonymous.4open.science/r/SELF-BC5F}.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15658
Loading