Machine Unlearning via Information Theoretic Regularization

Shizhou Xu, Thomas Strohmer

Published: 28 Dec 2025, Last Modified: 27 Jan 2026OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: How can we effectively remove or “unlearn” undesirable information, such as specific features or the influence of individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a unified mathematical framework based on information-theoretic regularization to address both data point unlearning and feature unlearning. For data point unlearning, we introduce the Marginal Unlearning Principle, an auditable and provable framework. Moreover, we provide an information-theoretic unlearning definition based on the proposed principle and provable guarantees on sufficiency and necessity of marginal unlearning. We then show that the proposed framework provides a natural solution to the marginal unlearning problem. For feature unlearning, the framework applies to deep learning with arbitrary training objectives. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications. From a mathematical perspective, we provide a unified analytic solution to the optimal feature unlearning problem with a variety of information-theoretic training objectives. Our theoretical analysis reveals intriguing connections between machine unlearning, information theory, optimal transport, and extremal sigma algebras. Numerical simulations support our theoretical finding.