## Setup

All experiments were done using Python 3.10.16. We additionally use the following packages and versions:

```
pyserini==1.0.0
pandas==2.3.0
transformers==4.52.4
trl==0.19.0
datasets==3.2.0
torch==2.7.0
vllm==0.9.1
tqdm==4.67.1
accelerate==1.8.1
```

## Step 0: Train StandardRR and ReasonRR

To do this, we first download the [Rank1 training data](https://huggingface.co/datasets/jhu-clsp/rank1-training-data)

To LoRA fine-tune StandardRR using Qwen2.5-7B:

```
python train_reranker_sft.py \
    --base_model Qwen/Qwen2.5-7B \
    --sft_data_path </path/to/rank1-training-data/train.json> \
    --output_model_path </output/path/> \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 64 \
    --learning_rate 2e-4 \
    --lora_r 32 \
    --lora_alpha 64 \
    --epochs 1 \
    --remove_reasoning \
```

To LoRA fine-tune your own Rank1 using Qwen2.5-7B, we use the same command, but remove the ```--remove_reasoning``` flag:

```
python train_reranker_sft.py \
    --base_model Qwen/Qwen2.5-7B \
    --sft_data_path </path/to/rank1-training-data/train.json> \
    --output_model_path </output/path/> \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 64 \
    --learning_rate 2e-4 \
    --lora_r 32 \
    --lora_alpha 64 \
    --epochs 1 \
```

## Step 1: (BRIGHT ONLY) Setup BRIGHT Results

To replicate our results on BRIGHT, follow the below steps. Otherwise, feel free to skip to step 2!

### Step 1.1: Generate BRIGHT Qrels

Next, we need to get the BRIGHT relevance judgements into TREC format so we can evaluate using Pyserini

To do so, simply run:

```
python write_pyserini_qrels.py
```

This should save all of the qrel files to ```pyserini_qrels/```

### Step 1.2: Generate first-stage retrieval run files for BRIGHT dataset

**NOTE:** First-stage retrieval run files can also be found and downloaded here: https://github.com/ielab/llm-rankers/tree/main/Rank-R1/bright/bm25_gpt4_run. This should additionally work with our codebase. For your convenience, we have included these under the 
bm25_gpt4_run directory. 

Follow the steps in the [BRIGHT Repo](https://github.com/xlang-ai/BRIGHT) Evaluation section to get their BM25 run files.

To simplify, this was the command we used to get BM25 + GPT4-reasoning results:
```
#!/bin/bash

for task in biology earth_science economics psychology robotics stackoverflow sustainable_living pony leetcode aops theoremqa_theorems theoremqa_questions; do
    python run.py \
        --task $task \
        --model bm25 \
        --reasoning gpt4 
done
```

The below code was made to work for any run files from BRIGHT, so feel free to run your favorite first-stage model 😄 


## Step 2: Rerank!

Now, we should have trained models that we are ready to evaluate:

To run any of the models (StandardRR (StandardRanker), Rank1 (our impl.), Rank1-NoReason, or Rank1), simply run ```rerank.py```. Below is an example command to run StandardRanker (Qwen2.5-7B) theoremqa_questions:

```
corpus=theoremqa_questions
model=Qwen/Qwen2.5-7B
lora_module=/path/to/Qwen2.5-7B_standardrr
model_name=standardrr
mkdir -p result_files/$corpus

output_filename=result_files/$corpus/${corpus}_bm25_rerank_${lora_module##*/}

echo evaluating: $corpus , lora module: $lora_module
python rerank.py \
    --model_path $model \
    --model_name $model_name \
    --lora_module $lora_module \
    --corpus_name $corpus \
    --bright_run_file <path/to/bm25_gpt4_run/${corpus}_bm25_long_False/trec.txt \
    --qrels_path <path/to/pyserini_qrels/${corpus}.tsv> \ 
    --k 100 \
    --output_filename $output_filename \
```

For Rank1 (our impl.) (Qwen2.5-7B):

```
corpus=theoremqa_questions
model=Qwen/Qwen2.5-7B
lora_module=/path/to/Qwen2.5-7B_reasonrr
model_name=rank1_our_impl
mkdir -p result_files/$corpus

output_filename=result_files/$corpus/${corpus}_bm25_rerank_${lora_module##*/}

echo evaluating: $corpus , lora module: $lora_module
python rerank.py \
    --model_path $model \
    --model_name $model_name \
    --lora_module $lora_module \
    --corpus_name $corpus \
    --bright_run_file <path/to/bm25_gpt4_run/${corpus}_bm25_long_False/trec.txt \
    --qrels_path <path/to/pyserini_qrels/${corpus}.tsv> \ 
    --k 100 \
    --output_filename $output_filename \
```

NOTE: for non-bright datasets, you do not need to pass in a `bright_run_file` or `qrels_path`. Those arguments can be left blank.

To use another model, first plug in the corresponding LoRA module or model directory with any of:

- `standardrr`(meant to work with _standardrr LoRA module) 
- `rank1_our_impl` (meant to work with _reasonrr LoRA module) 
- `rank1` (meant to work with any jhu-clsp/rank1 model)
- `rank1_noreason` (meant to work with any jhu-clsp/rank1 model) 
  
To run on another dataset, simply replace `corpus` with one of the following:

- `dl19`
- `dl20`
- `dl21`
- `dl22`
- `dl23`
- `theoremqa_questions`
- `theoremqa_theorems`
- `aops`
- `leetcode`
- `pony`
- `sustainable_living`
- `stackoverflow`
- `robotics`
- `psychology`
- `economics`
- `earth_science`
- `biology`
