## AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

### /data
We provide train/dev/test splits in this directory. We use the Wikipedia snapshot from December 20th, 2018. Please place the documents (psgs_w100.tsv) in this directory, which can be downloaded from [DPR repo](https://github.com/facebookresearch/DPR).

Each instance consists of `question`, `ambiguous_entity`, `qid`, and a list of `documents`. Each element in `documents` consists of `title` which is a disambiguated entity, `text`, `pid` for referencing psgs_w100.tsv, and `answer`.

### /models
Please place the necessary models in this folder.

### /exp
Outputs such as json files during experiments will be saved in this directory.

### /src
This directory contains the codes.

#### /src/generation
1. We first gather ambiguous entities and its disambiguations leveraging Wikipedia disambiguation pages.

    ```python gather.py``` will use Wikipedia API to search through Wikipedia disambiguation pages and save the relationships between ambiguous entity and its disambiguations in a dictionary.

2. Next, we select two seed documents per ambiguous entity.

    ```python seed.py``` will iterate the dictionary above. For each ambiguous entity, we compare two documents belonging to distinct disambiugated entities and select one pair that exhibits the highest n-gram overlap. `/src` contains the utility codes for processing the documents, taken from [Contriever repo](https://github.com/facebookresearch/contriever).

3. Next, we generate question and two answers from the seed pairs.

    ```python generate_gpt.py [openAPI key]``` will generate 1500 questions with GPT-4, followed by filtration process.
    ```finetune_llama.py [path_to_llama_model]``` will finetune Llama model with QLoRA on GPT-4 generated questions.
    ```python generate_llama.py [path_to_finetuned_model]``` will generate reminaing questions with finetuned Llama model, followed by filtration process.

4. Finally, we expand the answer set for better coverage.

    ```python expand.py [path_to_QA_model]``` will expand the answers set, and now we have generated synthetic data!

#### /src/eval/ALCE
While our study mainly focuses on `Gold Only` setting, we also experiment on retrieved corpus. We leverage GTR as our retriever and the codes taken from [ALCE repo](https://github.com/princeton-nlp/ALCE). Please download necessary pre-computed embeddings and GTR model to use this code.

```python retrieval.py [path_to_gtr_wikipedia_index.pkl] [path_to_GTR_model] --retriever gtr --data_file ../../../data/test.json --output_file ../../../data/test_retrieved.json``` will perform retrieval.

#### /src/eval

1. ```qa.py [mode] [model] [openAPI key / path_to_QA_model]``` will run inference on test split.
    `mode` represents the following: `1: Gold Only`, `2: Gold+Retrieved`, `3: Retrieved Only`, `4: Few-shot`
    Put the name of the model you are using in `model`. If this contains "gpt", put openAPI key afterwards. Otherwise, put the model path to the argument.

2. ```sh df1.sh [mode] [model]``` will compute necessary operations for computing DF1 score.

3. ```eval.py [mode] [model]``` will compute Answer Recall / Entity Recall / Entity-Answer Recall / Disambig-F1 scores.