Keywords: Unlearning, Mode Connectivity, Loss Landscape Analysis
Abstract: Machine Unlearning aims to remove undesired information from trained models without requiring full retraining from scratch.
Despite recent advancements, their underlying loss landscapes and optimization dynamics received less attention.
In this paper, we investigate and analyze machine unlearning through the lens of mode connectivity--the phenomenon where independently trained models can be connected by smooth low-loss paths in the parameter space.
We define and study mode connectivity in unlearning (MCU) across a range of overlooked conditions, including models trained curriculum learning, second-order optimization, and cross-method connectivity.
Our findings show distinct patterns of loss landscapes across various datasets, training paradigms, and unlearning methods.
With MCU, we analyze the mechanistic (dis)similarity between unlearning methods.
We also demonstrate MCU can be used to improve generalization of unlearning and defending against relearning attacks.
To the best of our knowledge, this is the first study of loss landscape analysis of machine unlearning with mode connectivity.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: ethical considerations in NLP applications
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 1651
Loading