Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs). However, their exact functionalities are left unclear and the optimal strategy remains an open question, thus impeding the understanding and improvement of existing methodologies. In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance---the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. To study their usefulness, we design specific reweighting strategies for each goal and evaluate their respective effects on unlearning. We conduct extensive empirical analyses on well-established benchmarks, and summarize some important observations as follows: (i) Saturation enhances efficacy more than importance-based reweighting, and their combination can yield additional improvements. (ii) Saturation typically allocates lower weights to data with lower likelihoods, whereas importance-based reweighting does the opposite. (iii) The efficacy of unlearning is also largely influenced by the smoothness and granularity of the weight distributions. Based on these findings, we propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance. Empirical results on extensive datasets validate the efficacy of our method, potentially bridging existing research gaps and indicating directions for future research. Our code is available at https://github.com/tmlr-group/SatImp.
Lay Summary: (1) Current explorations in LLM unlearning remain insufficient, particularly lacking a clear understanding of the fundamental question: "How do existing reweighting strategies actually work?" This paper systematically investigates and summarizes existing approaches, filling important gaps in the field of LLM unlearning research. (2) This paper finds that existing reweighting methods all fall under the saturation-based paradigm, while the importance-based paradigm has been largely overlooked. Moreover, key training details—such as smoothness, hard-soft weighting, and granularity—have not been adequately discussed or explored. (3) We annotated key tokens in the TOFU dataset to explore the importance-based paradigm. Considering the impracticality of manual annotation on large-scale data, we further investigated the correlation between weights and likelihoods. Interestingly, saturation-based methods tend to assign large weights to low-likelihood tokens, whereas importance-based methods show the opposite trend. Additionally, we extensively discussed training details and ultimately incorporated them into the proposed SatImp method. (4) In summary, this paper makes several fundamental contributions: new data annotations, a systematic review of existing methods, and findings regarding training detail choices. These contributions are both important and timely.
Link To Code: https://github.com/tmlr-group/SatImp
Primary Area: Deep Learning->Foundation Models
Keywords: Large Language Model, Large Language Model Unlearning
Submission Number: 916
Loading