**FaithUnBench.json**

"FaithUnBench" is a new benchmark first introduced in our paper "Knowledge-localized Unlearning to Ensure Faithful Forgetting for Language Models".
This dataset is uesd to investigate and evaluate the faithfulness of unlearning methods.
We built this benchmark based on Wikidata (www.wikidata.org) triples for the 200 famous people in the world.


The dataset is constructed for each entity (e.g., An entity "Q76" means "Barack Obama" in Wikidata).
Each entity consists of various types of questions: (1) "Base QA datasets", (2) "Paraphrased QA datasets", (3) "Multi-hop QA datasets", (4) "Same-answer QA datasets".

In the "FaithUnBench.json" file, each entity includes "hop1_unlearn", "hop1_test", "hop2_test", and "same_object" questions.

"hop1_unlearn" questions correspond to "Base QA datasets", used in the unlearning process. Each question of "hop1_unlearn" composes each cluster for evaluation.
"hop1_test" questions correspond to "Paraphrased QA datasets", used in the evaluation process.
"hop2_test" questions correspond to "Multi-hop QA datasets", used in the evaluation process.
"same_object" questions correspond to "Same-answer QA datasets", used in the evaluation process.

In addition, this dataset also includes "answers" and "false_answer_options" for each question, used in the evaluation process.




**Splits (forget set, retaining set, test set)**

We concat all clusters of entities sequentially; thus, we have a total of 664 clusters.
Now, we split 664 clusters for the forget set, the retaining set, and the test set.
Indices of each set (forget set: 5%, retaining set: 10%, test set: 70%) is included in "forget set_split.txt", "retaining_set_split.txt", and "test_set_split.txt"