AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs

Published: 27 Oct 2025, Last Modified: 27 Oct 2025NeurIPS Lock-LLM Workshop 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: differential privacy, LLM security, un-distillable, un-finetunable, PAC learning, RAPPOR, knowledge extraction prevention, model protection, hybrid privacy, constructive protection, intellectual property, privacy-preserving mechanisms
TL;DR: PAC+RAPPOR hybrid DP prevents LLM extraction. Rare events: zero-ε protection. Common events: noisy but useful. Constructive defense against distillation/fine-tuning. KL=0.0013, correlation=0.798
Abstract: Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-$\epsilon$ local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error. AlignDP aligns with Lock-LLM goals, making models un-distillable, un-finetunable, and un-editable by mechanism.
Submission Number: 42
Loading