DUSK: Do Not Unlearn Shared Knowledge

19 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi Source Machine Unlearning, Large Language Models (LLMs)
TL;DR: We introduce DUSK, a benchmark for evaluating whether LLM unlearning methods can selectively remove forget-specific content while preserving shared knowledge under realistic data overlap.
Abstract: Machine unlearning aims to remove "forget" data while preserving knowledge from the "retain" data, but what should happen when they share content? According to the formal definition of machine unlearning, an unlearned model should be indistinguishable from a retrained model trained solely on the retain set. This means that shared knowledge must remain while unique forget content is removed. We introduce DUSK, the first benchmark to evaluate unlearning under realistic knowledge overlap. DUSK constructs documents that contain shared and unique knowledge across five different styles and defines seven metrics to test whether methods erase forget-specific expressions without discarding valid shared facts. By evaluating nine recent approaches, we find that despite the frequent removal of surface text, current methods struggle to distinguish unique from shared knowledge, either removing shared knowledge that should be preserved or failing to erase forget-specific information. DUSK provides a controlled, reproducible testbed for diagnosing such failures and guiding more precise unlearning algorithms.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15421
Loading