# CoRAG (Chain-of-Retrieval Augmented Generation)

## Requirements

```
pip install -r requirements.txt
```

## How to Run

Here we provide an example for running inference with CoRAG-8B on the MultihopQA dataset.
We tested this on a machine with 8 A100 GPUs (40GB).

1. Download embeddings and start the E5 search server.

```bash
bash scripts/download_embeddings.sh

# The server logs will be in e5_server.log
bash scripts/start_e5_server.sh
```

2. Start the vLLM server and load the CoRAG-8B model.

```bash
# The server logs will be in vllm_server.log
bash scripts/start_vllm_server.sh /path/to/model
```

3. Run the inference script. By default, we will use greedy decoding with max path length `L = 6`.

```bash
# It will evaluate on [2wikimultihopqa, bamboogle, hotpotqa, musique] sequentially.
bash scripts/eval_multihopqa.sh
```

At the end, you will see the evaluation metrics similar to the following (for MuSiQue dataset):

```json
{
    "em": 27.679,
    "f1": 38.532,
    "accuracy": 27.141,
    "num_samples": 2417,
    "max_path_length": 6,
    "decode_strategy": "greedy",
    "token_consumed": 23818600,
    "average_token_consumed_per_sample": 9854.613156805957
}
```

Due to the randomness of the sampling process, the results may vary slightly each time you run the script especially for small datasets like Bamboogle.
