Feature Unlearning: Theoretical Foundations and Practical Applications with Shuffling

Yue Yang, Jinhao Li, Hao Wang

Published: 18 Sept 2025, Last Modified: 21 Sept 2025NeurIPS 2025EveryoneCC BY 4.0

Abstract: Machine unlearning has become a focal point in recent research, yet the specific area of feature unlearning has not been thoroughly explored. Feature unlearning involves the elimination of specific features' effects from an already trained model, presenting distinct challenges that are still not comprehensively addressed. This paper presents a novel and straightforward approach to feature unlearning that employs a tactical shuffling of the features designated for removal. By redistributing the values of the features targeted for unlearning throughout the original training dataset and subsequently fine-tuning the model with this shuffled data, our proposed method provides a theoretical guarantee for effective feature unlearning. Under mild assumptions, our method can effectively disrupt the established correlations between unlearned features and the target outcomes, while preserving the relationships between the remaining features and the predicted outcomes. Our empirical studies across various datasets,validate that our approach not only successfully removes the effects of specified features but also maintains the informational integrity of the remaining features while achieving a faster convergence rate.