CURE: Scalable LLM Unlearning by Correcting Responses with Retrieved Exclusions

CURE: Scalable LLM Unlearning by Correcting Responses with Retrieved Exclusions

ACL ARR 2025 May Submission7374 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Language models trained on web-scale corpora risk memorizing and exposing sensitive information, prompting the need for effective machine unlearning methods. Prior methods along these lines, ranging from blocking sensitive input queries to modifying model parameters, often fail to prevent leakage in generated responses and risk unintentionally forgetting important general knowledge (i.e., catastrophic forgetting). To address the limitations, we propose Corrective Unlearning with Retrieved Exclusions (CURE), a response-level unlearning framework that identifies and edits leaked content in model outputs without updating the original model. Specifically, CURE employs a corrector that flags and revises unwanted content with unlearning contexts provided as in-context examples for leakage detection. To efficiently handle large-scale unlearning requests, we integrate retrieval augmentation to dynamically select relevant unlearning samples based on the model's initial output, effectively reducing the context length required for correction. Extensive evaluations show that CURE significantly reduces response-level leakage while preserving model utility, maintaining robust performance even under continual unlearning setups.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: security and privacy,safety and alignment,retrieval-augmented generation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 7374

Loading