LUCID: Universal Auditing of Distilled Large Language Models

09 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models. distillation protection. IP protection.
TL;DR: This paper proposes a new method for tracing unauthorized distillation.
Abstract: The growing transparency of large language models (LLMs) makes distillation into smaller models an inevitable practice, allowing users to cheaply inherit advanced capabilities such as reasoning. Yet this trend also exposes model providers to new risks: unauthorized data distillation may misappropriate the teacher model’s valuable functions, resulting in copyright violations, privacy leaks, and other serious harms. Existing fingerprinting techniques mainly focus on detecting complete model theft, offering little protection for specific functional capabilities, and many require white-box access, limiting real-world applicability. In this work, we propose LUCID (LLM distillation Unveiled via invarianCe auditor Infringement Detection), the first black-box detection framework tailored to identifying the misappropriation of a victim model’s specific capability, particularly those acquired through distillation. LUCID constructs both infringing and non-infringing models on a capability-sensitive observation dataset, designs self-reflective prompts to elicit internal judgments from the protected model, and extracts judge-token representations to train a binary classifier for infringement detection. Theoretical analysis substantiates the generalization ability and decision boundary separability of our approach, while empirical results demonstrate its effectiveness in reliably identifying unauthorized data distillation without requiring access to the suspect model’s architecture or parameters.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 3381
Loading