The Geometry of Forgetting: Analyzing Machine Unlearning through Local Learning Coefficients

Published: 01 Jul 2025, Last Modified: 04 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: machine unlearning, interpretability, local learning coefficient, geometry, evaluation
TL;DR: This paper introduces Local Learning Coefficients as a quantitative tool to evaluate machine unlearning by analyzing the distinct geometric changes algorithms induce in models.
Abstract: Machine unlearning, the targeted removal of knowledge from LLMs, is vital for AI safety and privacy, yet robustly evaluating its success remains a significant challenge. Existing black-box evaluation protocols provide an incomplete picture of unlearning robustness, fail to explain utility loss mechanisms, or offer comprehensive guarantees. This work proposes a novel evaluation framework grounded in Singular Learning Theory (SLT), employing the refined Local Learning Coefficients (rLLC) to quantitatively analyze the geometric signatures imprinted by unlearning algorithms on neural network loss landscapes. We demonstrate that these rLLCs reveal distinct, layer-specific geometric changes for methods like Gradient Ascent (GA), Representation Misdirection (RMU), and Negative Preference Optimization (NPO), and that these geometric signatures correlate with macroscopic unlearning properties. Our analysis on TinyStories models substantiates these findings and highlights the utility of rLLCs in diagnostics, such as identifying RMU's intervention layer, positioning rLLCs as a powerful tool for advancing the principled evaluation of machine unlearning.
Submission Number: 4
Loading