# Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check

This codebase is largely based on the TOFU benchmark (https://github.com/locuslab/tofu). Please follow the instructions in TOFU for setting up the environment.

## Instructions
- Finetune the model on full/retain-only datasets: `run_finetune.sh`
- Run machine unlearning (currently supports GA, GD, NPO, DPO, and IHL): `run_forget.sh`
- Evaluate using original metrics from TOFU: `run_evaluate.sh` and `run_aggregate.sh`
- Generate samples from each model: `run_save_generations.sh`
- Compute bidirectional likelihoods for FADE computation: `run_compute_likelihoods.sh`