MUter: Machine Unlearning on Adversarially Trained Models
Abstract: Machine unlearning is an emerging task of removing
the influence of selected training datapoints from a trained
model upon data deletion requests, which echoes the widely
enforced data regulations mandating the Right to be Forgot-
ten. Many unlearning methods have been proposed recently,
achieving significant efficiency gains over the naive baseline
of retraining from scratch. However, existing methods focus
exclusively on unlearning from standard training models and
do not apply to adversarial training models (ATMs) despite
their popularity as effective defenses against adversarial
examples. During adversarial training, the training data are
involved in not only an outer loop for minimizing the training
loss, but also an inner loop for generating the adversarial
perturbation. Such bi-level optimization greatly complicates
the influence measure for the data to be deleted and ren-
ders the unlearning more challenging than standard model
training with single-level optimization. This paper proposes
a new approach called MUter for unlearning from ATMs.
We derive a closed-form unlearning step underpinned by a
total Hessian-related data influence measure, while existing
methods can mis-capture the data influence associated with
the indirect Hessian part. We further alleviate the compu-
tational cost by introducing a series of approximations and
conversions to avoid the most computationally demanding
parts of Hessian inversions. The efficiency and effectiveness
of MUter have been validated through experiments on four
datasets using both linear and neural network models.
Loading