Abstract: Machine unlearning is a significant part of machine learning security because machine learning models are not immune to attacks such as poisoning attacks. In addition, various studies have proved that it is feasible to obtain training set information from the model, which can lead to leakage of user data privacy. Therefore, there is an urgent need for a method to remove specific data from the model training set. Naive retraining will bring huge time and resource costs, while SISA can achieve a balance between unlearning cost and model performance to a certain extent by dividing the dataset into multiple independent shards. As forgotten data grows, the performance of SISA is still unable to meet the actual needs. In this paper, we propose a method, which dynamically selects the shards that need to be retrained. First, we divide the dataset into multiple independent shards, when the forgotten data makes multiple shards need to be retrained at the same time, our approach is to mix these affected shards together, and then train the mixed shards individually to achieve global forgetting. The simulation results show that compared with SISA, we can get more than 3× speedup on different datasets, and the accuracy of the aggregation model is also improved by 0.5% and 2.1% on MNIST and CIFAR10, respectively.
Loading