Geometric-disentanglement Unlearning

Duo Zhou; Yuji Zhang; Tianxin Wei; Ruizhong Qiu; Ke Yang; Xiao Lin; Cheng Qian; Jingrui He; Hanghang Tong; ChengXiang Zhai; Heng Ji; Huan Zhang

Geometric-disentanglement Unlearning

Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, ChengXiang Zhai, Heng Ji, Huan Zhang

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Large Language Model Unlearning

Abstract: Large language models (LLMs) can internalize private or harmful content, motivating unlearning that removes a forget set while preserving retaining knowledge. However, forgetting updates often cause collateral degradation on retaining knowledge, creating a persistent trade-off. Existing LLM unlearning methods are often heuristic, and other theoretical approaches rely on offline feature constructions that do not capture update-time forget-retain interaction in LLMs. To address this limitation, we aim to develop an LLM unlearning method that reduces the forget-retain trade-off with theoretical guarantees. We take a first-principles view by formalizing ``no side effects'' as local retain invariance under small parameter updates, and prove an equivalence under optimizer-induced geometry: the retain loss is locally invariant if and only if the update direction is orthogonal to the subspace spanned by retain gradients. Based on the insight, we propose Geometric-disentanglement Unlearning (GU), a lightweight and theoretically grounded projection that can be plug-and-play to existing gradient-based unlearning methods to mitigate forget-retain side effects. Experiments on TOFU, MUSE, and WMDP-cyber show that GU strengthens forgetting while reducing retain drift. When added to SimNPO, it achieves up to 62\% improved forgetting Extraction Strength (ES) and 31\% higher retain ES.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 127

Loading