# Thought-Retriever: Don't Just Retrieve Raw Data, Retrieve Thoughts


<!-- ![Method](./figures/model.png) -->

<div align="center">
  <img src="./figures/model.png" width="700" alt="GoR">
</div>






## 📌Preliminary


### Environment Setup

```shell
# create a new environment
conda create -n thought-retriever python=3.10
conda activate thought-retriever

# install pytorch. Modify the command to match your CUDA version
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 \
  --extra-index-url https://download.pytorch.org/whl/cu113

# install related libraries
pip install -r requirements.txt

```

### Dataset Preparation

[QMSum](https://github.com/Yale-LILY/QMSum)
[WCEP](https://huggingface.co/datasets/ccdv/WCEP-10)
[Booksum](https://huggingface.co/datasets/kmfoda/booksum)
[GovReport](https://huggingface.co/datasets/ccdv/govreport-summarization/tree/refs%2Fconvert%2Fparquet/document)
[SQuALITY](https://github.com/nyu-mll/SQuALITY)


Save the downloaded files in the `./data/[DATASET_NAME]` folder.

### Generate Pre-queries

You can generate pre-queries using the following code:

```bash
python pre_query_generator.py \
    --root DATA_PATH \
    --api_base YOUR_API_BASE \
    --api_key YOUR_API_KEY \
    --dataset_type related_multi \
    --chunk_size 500 \
    --feat_cross \
    --with_doct5query \
    --device 0
```

## ⭐Experiments


### Run thought-retriever and Metric Evaluation


You can follow this example:

```bash
python run_experiment.py \
  --root DATA_PATH \
  --api_base YOUR_API_BASE \
  --api_key YOUR_API_KEY \
  --llm_model YOUR_MODEL \
  --cuda 0 \
  --chunk_num 8 \
  --recall_coe 5 \
  --sim_thre 0.85

```
### Evaluation using LLM-as-a-Judge

Evaluate the answers generated by thought-retriever through this:

```bash
python llm_evaluation.py \
  --api_base YOUR_API_BASE \
  --api_key YOUR_API_KEY \
  --llm_model YOUR_MODEL \
  --seed 42 \
  --tau 0

```

