## 1. Installation

First, create a virtual environment named `sps` using `conda`, and install the required dependencies:

```bash
conda create -n sps
conda activate sps
pip install -r requirements.txt
```

## 2. Retriever Setup

By default, we use [Contriever](https://github.com/facebookresearch/contriever) as our retrieval component.

### Download Data

Download preprocessed passage data used in DPR:

```bash
cd data
wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
```
Then, download the generated passages. We use [Contriever-MSMARCO](https://huggingface.co/facebook/contriever-msmarco)
```bash
cd data
wget https://dl.fbaipublicfiles.com/contriever/embeddings/contriever-msmarco/wikipedia_embeddings.tar
```
Unzip the file and you can get the wikipedia_embeddings files.



## 3. SPS Variants: Online and Offline Retrieval

We provide two versions of the SPS system:

- **Online Retrieval (`sps-online.py`)**: Performs real-time document retrieval during generation.
- **Offline Retrieval**: Uses documents that are pre-retrieved and cached ahead of time.

### 3.1 Running the Online Version

The script `sps-online.py` implements the online retrieval version. It relies on helper functions defined in `speculative_online.py`.

```bash
python sps-online.py \
    --input_data_path ./data/asqa_eval_gtr_top100.json \
    --num_max_new_tokens 50 \
    --m 5 \
    --k 5 \
    --n 10 \
    --output_path output.jsonl
```

This configuration sets the chunk size to 50, retrieves the top 10 documents, and creates 5 document subsets, each containing 5 documents.

### 3.2 Running the Offline Version

For the offline version, we store the pre-retrieved documents in separate folders for each dataset. You can run the offline pipeline by pointing to these directories.

## 4. Parameter Description

| Parameter               | Description                                                                 |
|-------------------------|-----------------------------------------------------------------------------|
| `--input_data_path`     | Path to the input dataset, including retrieved documents.                   |
| `--num_max_new_tokens`  | Number of tokens to generate in each chunk (e.g., 50).                      |
| `--m`                   | Number of subsets to sample from the retrieved documents.                   |
| `--k`                   | Number of documents in each subset.                                         |
| `--n`                   | Number of top documents to retrieve per query.                              |
| `--output_path`         | Path to save the final output in JSONL format.                              |

## 5. Dataset-Specific Usage

### 5.1 Long-Form QA: ASQA

For the long-form question answering dataset **ASQA**, we provide two versions:

- `sps-asqa-mis.py`: Uses the Mistral model for generation.
- `sps-asqa-alpaca.py`: Uses the Alpaca model for generation.

The only difference between the two scripts is the prompt formatting, which is adjusted to match each model's instruction style.

To run the Mistral version, use the following command:

```bash
python sps-asqa-mis.py \
    --input_data_path ./data/asqa_eval_gtr_top100.json \
    --num_max_new_tokens 50 \
    --m 5 \
    --k 3 \
    --n 5 \
    --output_path YOUR_OUTPUT.jsonl
```

### 5.2 Short-Form QA

For short-form QA tasks, we follow the single-round generation setup used in Self-RAG, since the answers are typically brief.

#### Example: TriviaQA

```bash
python sps-tri-mis.py \
    --input_data_path ./data/triviaqa_test.jsonl \
    --num_max_new_tokens 200 \
    --m 5 \
    --k 5 \
    --n 10 \
    --output_path YOUR_OUTPUT.jsonl
```

### 5.3 Closed-Set QA

#### Example: ARC-C

```bash
python sps-arc-mis.py \
    --input_data_path ./data/arc_challenge_processed.jsonl \
    --num_max_new_tokens 200 \
    --m 5 \
    --k 5 \
    --n 10 \
    --output_path YOUR_OUTPUT.jsonl
```

## 6. Evaluation

For the ASQA dataset, we use the official evaluation script provided by the ALCE repository: [ALCE] (https://github.com/)princeton-nlp/ALCE

We provide our generated outputs using the Mistral-Instruct-7B model in `data/asqa-mis-top5-m5-k3.json`.

To run the evaluation, please clone the ALCE repository and execute the following command:

```bash
python eval.py --f asqa-mis-top5-m5-k3.json --qa --mauve
```
