Abstract: With the emergence of the ‘Right to be Forgotten’, privacy protection and reducing harmful content are essential. Addressing such concerns with differential privacy and data pre-processing involves retraining the model from scratch, which is costly. Hence, Large Language Model (LLM) Unlearning has gained traction given its computational efficiency. Along with the ability to forget, the ability to retain the remaining knowledge is equally important. However, there is a trade-off between retention and forgetfulness effectiveness in all state-of-the-art LLM Unlearning methods. In addition, some methods result in catastrophic collapse of the models, leading to a complete loss of usability. We introduce the ‘Gradual Negative Matching’ (GNM) method and evaluate it on the benchmark released with the SemEval 2025 LLM Unlearning task. GNM pairs Forget set input with gradual negative outputs obtained by iteratively prompting the LLM and performs gradient descent. It achieves best performance across Question Answering (QA) evaluations while performing comparably in Sentence Completion evaluations with respect to the baselines. Further, GNM results in, on average, \(26\%\) improvement in the RougeL-based metric for QA tasks.
Loading