# Machine-Unlearning-Analysis
This is an Official Repository of 'An Information Theoretic Metric for Evaluating Unlearning Models'

## Table of Contents
1. [Getting Started](#getting-started)
2. [Quick Start](#quick-start)
3. [Dataset Preparation](#dataset-preparation)
4. [Pretraining](#pretraining)
5. [Unlearning](#unlearning)
6. [Support](#support)
7. [Citation](#citation)

## Getting Started
Follow these steps to build Conda environment:
Install packages
```bash
conda create -n mu python=3.10.0 -y
conda activate mu
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install tqdm scikit-learn fvcore transformers timm matplotlib 
```

## Quick Start
We put default model `ResNet18_cifar10_class_4_1_retrain0.pth` and `ResNet18_cifar10_ori0.pth` for single-class unlearning. All you need to do is execute command below after [setup](#getting-started). It will execute method 'cola' on ResNet-18, CIFAR-10.
```
bash ex.sh cola
```
If it not works, please execute ./cifar10.sh in dataest folder.

## Dataset Preparation
- Run shell scripts in `dataset` to download CIFAR-10, CIFAR-100 datasets.
- For ImageNet-1K dataset, you should preprocess validation set to split them into class folders.
- The directory should be structured as follows:
```
└── dataset
    ├── cifar10
    |   ├── train
    |   └── test
    ├── cifar100
    |   ├── train
    |   └── test
    └── imagenet
        ├── train
        └── val
             ├── ...
             └── class_folders
```

## Pretraining
- Run shell scripts in the `pretraining` to create original and retrain models.
- The model weights will be saved in the `checkpoints` folder.
1. CIFAR-10/CIFAR-100 Pretraining
```bash
bash pretraining/pretrain.sh
```
#### Hyperparameters
- TEST_MODE: set the unlearning mode here. `class` and `sample` is available (Note that sub-class is not availabe yet.)
- CLASS_IDX: parameter for `class`, you can set the class idx where you want to start to unlearn
- CLASS_IDX_UNLEARN: parameter for `class`, you can set the number of classes you forget.
- SAMPLE_UNLEARN_PER_CLASS: paramter for `sample`, you can set the number of random samples you want to forget for each class.

2. ImageNet-1K Pretraining

#### [Notice] Download Imagenet and put them in dataset/imagenet before pretraining.
- ResNet-50 Pretraining
```bash
bash pretraining/imagenet_resnet50.sh 
```
- ViT Pretraining
```bash
bash pretraining/imagenet_vit.sh 
```
* Set `--unlearning_type` to `original` / `class` / `sample` to create the original / class-wise forgetting retrain / random data forgetting retrain model.

## Unlearning
To perform unlearning and evaluate the unlearned model, you can run the ex.sh script.
### Basic Usage
Run the following command, replacing $method_name with the desired unlearning method:
```bash
    bash ex.sh $method_name
```

**Example**:
```bash
bash ex.sh cola
```
Available methods: ft, rl, teacher, neggradp, eu5, cf5, eu10, cf10, scrub, salun, sparse, cola

### Configuration
Before running the script, adjust the following configuration parameters in the ex.sh script as needed:

1. **Devices and Seeds**:
- **DEVICES**: List of GPU devices to use. E.g., DEVICES=(0 1 2).
- **SEEDS**: List of random seeds for reproducibility. E.g., SEEDS=(0 1 2).
- **MODEL_SEEDS**: List of seeds for model initialization. E.g., MODEL_SEEDS=(0 1 2).

2. **Experiment Parameters**:
- **DATASET**: Dataset to use (cifar10, cifar100, imagenet).
- **MODEL_NAME**: Name of the model (ResNet18, ResNet50, ViT).
- **TEST_MODE**: Mode of unlearning (class, sample, or sub_class).

3. **Unlearning Specific Parameters**:
- **CLASS_IDX**: Class index to target (for TEST_MODE=class).
- **CLASS_IDX_UNLEARN**: Number of class indices to unlearn (for TEST_MODE=class).
- **SUB_CLASS_NAME**: Sub-class name for random data forgetting (for TEST_MODE=sub_class).
- **SAMPLE_UNLEARN_PER_CLASS**: Number of samples to unlearn per class (for TEST_MODE=sample).

4. **Method Hyperparameters**:
- **METHOD**: Method to use (lowercase method name. E.g., ft, neggradp, cola).
- **REMAIN_EPOCHS**: Number of epochs for (remaining) training. 
- **FORGET_EPOCHS**: Number of epochs for forgetting training. Mostly 0
- **REMAIN_BATCH_SIZE**: Batch size for retain set.
- **FORGET_BATCH_SIZE**: Batch size for forget set.
- **LR**: Learning rate (e.g., 5e-4).
- **OPTIMIZER**: Optimizer to use (adam or sgd).

5. **Options**:
- **SAVE_RESULT_MODEL**: Option to save the resulting model.
- **EVAL_MODE**: Uncomment to run only evaluation summary.

### Example Configuration
Here’s an example configuration snippet of COLA:
```bash
# Devices and Seeds
DEVICES=(0)
SEEDS=(0)
MODEL_SEEDS=(0)

# Experiment Parameters
DATASET=cifar10
MODEL_NAME=ResNet18
TEST_MODE=class

# Class and Sub-class Specific Parameters
CLASS_IDX=4
CLASS_IDX_UNLEARN=1
SUB_CLASS_NAME="sea"
SAMPLE_UNLEARN_PER_CLASS=50

# Method Hyperparameters
METHOD=$1
REMAIN_EPOCHS=10
FORGET_EPOCHS=10
REMAIN_BATCH_SIZE=64
FORGET_BATCH_SIZE=16
LR=5e-4
OPTIMIZER="adam"

# Options
SAVE_RESULT_MODEL="--save_result_model"
# EVAL_MODE="--eval_mode"

```

## Acknowledgement

Followings are the codebase we built upon:
- https://github.com/vikram2000b/bad-teaching-unlearning
- https://github.com/meghdadk/SCRUB
- https://github.com/OPTML-Group/Unlearn-Sparse
- https://paperswithcode.com/paper/boundary-unlearning (No code, only zip file is available)
