Distillation Lineage Inspector: Black-Box Auditing of Model Distillation in LLMs

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model distillation, Audit Membership, Intellectual property
TL;DR: We introduce Distillation Lineage Inspector (DLI), a framework for detecting whether an LLM has been illegally distilled from foundation models, even under black-box conditions.
Abstract: Model distillation has emerged as a widely used technique for creating efficient models tailored to specific tasks or domains. However, its reliance on knowledge from foundation models raises significant legal concerns regarding intellectual property rights. To address this issue, we propose the Distillation Lineage Inspector (DLI) framework, which enables model developers to determine whether their large language models (LLMs) have been distilled without authorization, even in black-box settings where training data and model architecture are inaccessible. DLI is effective across both open-source and closed-source LLMs. Experiments show that DLI achieves 80\% accuracy with as few as 10 prompts in fully black-box settings and yields a 45\% improvement in accuracy over the best baseline under standard experimental conditions. Furthermore, we analyze how the auditor’s knowledge of target models influences performance, providing practical insights for building privacy-preserving and regulation-compliant AI systems.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23794
Loading