Forgetting-MarI: LLM Unlearning via Marginal Information Regularization
Keywords: LLM Unlearning, Continual Learning, AI Safety & Privacy, Lifelong Agents, Trustworthy AI, Efficient Fine-tuning
Abstract: As Large Language Models (LLMs) face increasing regulatory scrutiny, the ability to surgically remove the influence of specific data without full retraining is critical, especially for deployed agentic systems that continuously accumulate user interactions, tool-use traces, and long-horizon trajectories. However, current LLM unlearning techniques are largely heuristic, lacking formal guarantees and often degrading model utility by removing information shared between the unlearn and retain sets. We bridge the gap between rigorous unlearning theory and LLM practice by introducing Forgetting-MarI. This framework provably isolates and removes only the marginal information, the unique effect contributed by the unlearn set, while preserving information supported by the retain set. By penalizing marginal information, we derive a tractable upper bound on the unlearn set’s residual influence in the unlearned models, yielding a verifiable notion of undetectability. Extensive experiments on Llama and GPT models (up to 8B parameters) confirm that Forgetting-MarI achieves superior trade-offs between unlearning efficacy and utility preservation compared to state-of-the-art baselines. These results position marginal-information regularization as a principled and practical primitive for more controllable, auditable, and safe unlearning in real-world LLM deployments.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 84
Loading