THE FUNDAMENTAL LIMITS OF LLM UNLEARNING: COMPLEXITY-THEORETIC BARRIERS AND PROVABLY OPTIMAL PROTOCOLS

Published: 05 Mar 2025, Last Modified: 06 Mar 2025BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track (between 2 and 4 pages)
Keywords: Machine Unlearning, Large Language Models, Computational Complexity, Differential Privacy, GDPR Compliance, AI Trustworthiness
TL;DR: We prove that exact unlearning in LLMs is coNP-hard, approximate unlearning requires near-linear time, and introduce Recursive Sketch-and-Freeze, an optimal protocol for efficient and provable unlearning.
Abstract: Modern machine unlearning techniques for large language models (LLMs) re- main heuristic, lacking formal characterization of their fundamental computa- tional limits. We establish the first complexity-theoretic foundation for LLM un- learning, revealing intrinsic tradeoffs between efficiency, precision, and regulatory compliance. Our framework formalizes (ϵ, δ)-machine unlearning via measure- theoretic alignment of retrained and unlearned model distributions, then proves transformer-specific hardness results: exact unlearning is coNP-hard, while ap- proximate unlearning requires Ω(T 1−o(1)) time under the Exponential Time Hy- pothesis (ETH). We construct an optimal Recursive Sketch-and-Freeze pro- tocol achieving these bounds through differential privacy duality and Kronecker- product sketching. Crucially, we identify phase transitions in R´enyi unlearning cost at critical model scales (n ≈ d log k). These results provide (1) theoretical benchmarks for evaluating unlearning algorithms, (2) complexity-aware guide- lines for AI regulation, and (3) mathematically grounded verification tools for GDPR/CPRA compliance.
Submission Number: 103
Loading