Keywords: LLM unlearning, machine unlearning, null space
Abstract: Large language models (LLMs) are raising increasing ethical and security concerns as they may reproduce private, sensitive, or hazardous content. This motivates the development of effective LLM unlearning techniques that can remove undesired knowledge from the model while preserving general utility. Existing unlearning methods mainly rely on fine-tuning, which is not only computationally intensive but also prone to utility degradation due to the entangled nature of knowledge in LLMs. In this paper, we propose a lightweight and controllable LLM unlearning framework, **UUE**, which reformulates unlearning as null-space-guided model editing. To ensure stability, we introduce a novel editing objective that achieves unlearning without explicit target outputs. We further design pluggable unlearning adapters and derive closed-form analytical updates with null-space guidance, ensuring minimal interference with retained knowledge. To further improve efficiency, we extend UUE with LoRA, yielding **UUE-L**. Extensive experiments on TOFU and WMDP benchmarks across multiple LLMs demonstrate that UUE and UUE-L achieve superior unlearning efficacy, significantly outperforming existing methods.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 24279
Loading