## Getting Started 🎯
### Installation
```bash
# Recommend Python 3.10.
conda create -n researcher python=3.10
conda activate researcher
pip install -e ./verl
pip install -e .
pip install wandb 
pip install uvicorn
pip install bs4
```

### Data
Download the [nq_hotpotqa](https://huggingface.co/datasets/PeterJinGo/nq_hotpotqa_train/tree/main) dataset processed by [Search-R1](https://github.com/PeterGriffinJin/Search-R1) and put it in the [directory](researcher/data/).


### Training Scripts
The training procedure consists of two stages, i.e., Warm up and GRPO Training:

1. Warm up:
```bash
./scripts/train/run_sft.sh [GPU_NUM]
```

2. GRPO Training:

```bash
python researcher/searcher/wiki_server.py
python researcher/searcher/run_fastapi.py --local_url [DENSE_RETRIVER_URL]
``` 
(2) Training and Testing
```bash
./scripts/run_grpo.sh
```
