Inference-time Unlearning via Adaptive Output Regulation

17 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, LLM Unlearning
Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in memorizing vast amounts of knowledge across diverse domains. However, the ability to selectively forget specific knowledge is critical for ensuring the safety and compliance of deployed models. Existing unlearning efforts typically fine-tune the model with resources such as forget data, retain data, and a calibration model. These additional gradient steps blur the decision boundary between forget and retain knowledge, often resulting in degraded overall performance. To avoid the negative impact of fine-tuning, it would be better to achieve *approximate unlearning at inference time*, where the model is dynamically guarded against generating responses related to the forget target without retraining or damaging its fluency. Current training-free approaches, though avoiding retraining, often suffer from incomplete or superficial forgetting. To this end, we introduce **GUARD**, an inference-time unlearning approach via adaptive output regulation to mitigate this problem without retraining or compromising fluency, which first employ a prompt classifier to detect unlearning targets and extract the corresponding forbidden tokens. We then dynamically penalize and filter candidate tokens during generation through a combination of token matching and semantic matching, thereby preventing the model from leaking the forgotten content. Experimental results on copyright-content unlearning tasks over the Harry Potter dataset and the MUSE benchmark, as well as entity unlearning tasks on the TOFU dataset, demonstrate that **GUARD** achieves strong forget quality across various tasks while causing almost no degradation to the LLM’s general capabilities, striking an excellent trade-off between forgetting and utility.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 8991
Loading