SELU: Energy-based Targeted Unlearning in LLMs

ICLR 2026 Conference Submission23296 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Unlearning, Large Language Models, Low-rank Adaptation, Energy-Based Model
TL;DR: We propose an energy-based objective for parameter-efficient machine unlearning
Abstract: Large language models (LLMs) often memorize sensitive or copyrighted content, motivating \emph{machine unlearning} methods that can remove specific knowledge without retraining from scratch. A challenge arises from how fine-tuning is performed, where it uses a lower learning rate than pre-training to avoid destabilizing existing knowledge which leave models underconfident on data it wants to retain. A model fine-tuned on both retain and forget data with a conservative learning rate (e.g., 1e-5) differs from a retain-only model trained more aggressively (e.g., 1e-4), which achieves stronger likelihood-scale alignment which results in lower negative log-likelihood (NLL) on retained knowledge. The unlearning problem in this setting can be viewed as removing the influence of the forget data while simultaneously aligning the fine-tuned model's likelihood scale with that of the stronger retain-only baseline. We propose \emph{Straight-through Energy Language Unlearning} (SELU), a parameter-efficient framework that integrates Low-Rank Adaptation (LoRA) with an energy-based objective guided by straight-through estimators (STE). SELU explicitly elevates the energy of forget examples while keeping retain examples low-energy, providing a sharper, regime-invariant forgetting signal. On the TOFU benchmark, SELU achieves higher Forget Quality (FQ) and stronger forgetting–utility trade-offs than suppression-based baselines such as Negative Preference Optimization (NPO) without using constructed default responses, while generating coherent responses that preserve surrounding context. Ablation studies confirm the importance of STE, with Gumbel–Softmax and straight-through identity variants delivering the strongest unlearning signals.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23296
Loading