KUnBR: Knowledge Density-Guided Unlearning via Blocks Reinsertion

ACL ARR 2025 February Submission8349 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Machine unlearning, which selectively removes specific knowledge from a pre-trained model without retraining from scratch, is crucial for addressing privacy, regulatory compliance, and ethical concerns in Large Language Models (LLMs). However, existing unlearning methods usually fail to thoroughly erase targeted knowledge, leaving residual information that can be easily recovered. To address these limitations, we propose Knowledge Density-Guided Unlearning via Blocks Reinsertion (KUnBR), a novel approach that enhances the degree of forgetting by first identifying knowledge-rich layers and then thoroughly eliminating the targeted knowledge. Our method introduces knowledge density estimation to quantify and locate layers containing the most knowledge, enabling precise unlearning. Additionally, we design a layer re-insertion strategy that extracts and re-inserts knowledge-rich layers into the original, bypassing gradient obstruction caused by masked layers and ensuring effective gradient propagation during unlearning. This strategy significantly reduces the model's vulnerability to knowledge recovery attacks. several unlearning datasets and utility benchmark (RKWU) demonstrate that KUnBR achieves state-of-the-art forgetting performance while maintaining model utility, generalizing across multiple strong unlearning methods.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: adversarial attacks/examples/training;
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 8349
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview