Keywords: Concept forgetting, Reliability
Abstract: The effectiveness of current machine learning models relies on their ability to grasp diverse concepts present in datasets. However, biased and noisy data can inadvertently cause these models to learn certain undesired concepts, undermining their ability to generalize and provide utility. Consequently, modifying a trained model to forget these concepts becomes imperative for their responsible deployment. We refer to this problem as *concept forgetting*. Our goal is to develop techniques for forgetting specific undesired concepts from a pre-trained classification model's prediction. To achieve this goal, we present an algorithm called **L**abel **AN**nealing (**LAN**). This iterative algorithm employs a two-stage method for each iteration. In the first stage, pseudo-labels are assigned to all the samples by annealing or redistributing the original labels based on the predictions of the model in the current iteration. During the second stage, the model is fine-tuned on this pseudo-labeled dataset generated from the first stage. We illustrate the effectiveness of the proposed algorithms across various models and datasets. Our method reduces *concept violation*, a metric that measures how much the model forgets specific concepts, by about 85.35% on the MNIST dataset, 73.25% on the CIFAR-10 dataset, and 69.46% on the CelebA dataset while maintaining high model accuracy.
Latex Source Code: zip
Code Link: https://github.com/Subhodip123/LAN
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission724/Authors, auai.org/UAI/2025/Conference/Submission724/Reproducibility_Reviewers
Submission Number: 724
Loading