Keywords: LLM unlearning
TL;DR: Open-source framework for LLM unlearning supporting multiple benchmarks and methods, enabling both the evaluation of unlearning techniques and the meta-evaluation of metrics.
Abstract: Deploying large language models (LLMs) raises concerns about privacy, safety, and legal compliance due to their tendency to memorize sensitive content. Robust unlearning is essential to ensure their safe and responsible use. Yet the task is inherently challenging, partly due to difficulties in reliably measuring whether unlearning has truly occurred. Moreover, fragmentation in current methodologies and inconsistent evaluation metrics hinder comparative analysis and reproducibility. To unify and accelerate research efforts, we introduce OpenUnlearning, a standardized and extensible framework that supports a wide range of unlearning methods, metrics, and benchmarks, enabling comprehensive evaluation. Leveraging OpenUnlearning, we propose a novel meta-evaluation benchmark focused specifically on assessing the faithfulness and robustness of evaluation metrics themselves. Overall, we establish a clear, community-driven pathway toward rigorous development in LLM unlearning research.
Submission Number: 23
Loading