# PiCL
We have implemented PiCL based on official implementation of [Dense Passage Retrieval for Open-Domain Question Answering][dpr_github] [(arxiv)][dpr_paper]. Refer to [Github repository][dpr_github] and [documentation for Hydra][hydra_doc] for more details on settings and required packages. 

Commands to train and test retriever models are provided in ```commands``` folder. You may modify them accordingly to run codes on your setting.

To change train/test setting for the model, modify the shell scipts in ```commands``` folder or configuration files in ```conf``` folder.
1. ```train_dpr.sh```: Train DPR model and generate checkpoint files under ```outputs/checkpoint```.
2. ```generate_embeddings.sh```: Generate encoded embeddings for an input corpus under ```outputs/embeddings```.
3. ```validate_retriever.sh```: Generate scores for an input question set and corpus embeddings under ```outputs/validation```.

We are also sharing train sets with counterfactual samples for reproduction. You may choose to run experiments using our dataset or create counterfactual samples from scratch with our preprocessing pipeline.
[counterfactual_train_set][cf_data]
[counterfactual_train_set_with_dpr_hard_negatives][cf_data]

[dpr_paper]: https://arxiv.org/abs/2004.04906
[dpr_github]: https://github.com/facebookresearch/DPR
[hydra_doc]: https://hydra.cc/docs/intro/


[cf_data]: https://drive.google.com/file/d/11XR25G6votBlm6QwYLNSSdjJ7s3dmIyu/view?usp=sharing
[adv_hn_cf_data]: https://drive.google.com/file/d/1dRy5swycjb9BFFn06Nw7y_pOuUBLRYAy/view?usp=sharing

Training Sample Format
------------------

<pre>
<code>
[
    {
        "question": ... ,
        "answers": [],
        "positive_ctxs": [
            {
                "title": ... ,
                "text": ... ,
                "passage_id": ...
            }
        ],
        "negative_ctxs": [
            {
                "title": ... ,
                "text": ... ,
                "passage_id": ...
            },
            ...
        ],
        "hard_negative_ctxs": [
            {
                "title": ... ,
                "text": ... ,
                "passage_id": ... 
            },
            ...
        ]
    }
]
</code>
</pre>