WaterDrum: Watermarking for Data-centric Unlearning Metric

Xinyang Lu; Xinyuan Niu; Gregory Kang Ruey Lau; Bui Thi Cam Nhung; Rachael Hwee Ling Sim; Fanyu Wen; Chuan-Sheng Foo; See-Kiong Ng; Bryan Kian Hsiang Low

WaterDrum: Watermarking for Data-centric Unlearning Metric

Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

Published: 11 Jun 2025, Last Modified: 02 Jul 2025MUGen @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Unlearning, Metrics, Benchmark, Unlearning, Verification

TL;DR: We propose the first data-centric unlearning metric based on watermarking that is effective and practical.

Abstract: Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain sets have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking to overcome these limitations. We introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available on Github and our new benchmark datasets are released on HuggingFace.

Submission Number: 53

Loading